6
Measurement Error in Surveys of the Low-Income Population

Nancy A.Mathiowetz, Charlie Brown, and John Bound

The measurement of the characteristics and behavioral experience among members of the low-income and welfare populations offers particular challenges with respect to reducing various sources of response error. For many of the substantive areas of interest, the behavioral experience of the welfare populations is complex, unstable, and highly variable over time. As the behavioral experience of respondents increases in complexity, so do the cognitive demands of a survey interview. Contrast the task of reporting employment and earnings for an individual continuously employed during the past calendar year with the response task of someone who has held three to four part-time jobs. Other questionnaire topics may request that the respondent report sensitive, threatening, socially undesirable, or perhaps illegal behavior. From both a cognitive and social psychological perspective, there is ample opportunity for the introduction of error in the reporting of the events and behaviors of primary interest in understanding the impacts of welfare reform.

This paper provides an introduction to these sources of measurement error and examines two theoretical frameworks for understanding the various sources of error. The empirical literature concerning the quality of responses for reports of earnings, transfer income, employment and unemployment, and sensitive behaviors is examined, to identify those items most likely to be subjected to response error among the welfare population. The paper concludes with suggestions for attempting to reduce the various sources of error through alternative questionnaire and survey design.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues 6 Measurement Error in Surveys of the Low-Income Population Nancy A.Mathiowetz, Charlie Brown, and John Bound The measurement of the characteristics and behavioral experience among members of the low-income and welfare populations offers particular challenges with respect to reducing various sources of response error. For many of the substantive areas of interest, the behavioral experience of the welfare populations is complex, unstable, and highly variable over time. As the behavioral experience of respondents increases in complexity, so do the cognitive demands of a survey interview. Contrast the task of reporting employment and earnings for an individual continuously employed during the past calendar year with the response task of someone who has held three to four part-time jobs. Other questionnaire topics may request that the respondent report sensitive, threatening, socially undesirable, or perhaps illegal behavior. From both a cognitive and social psychological perspective, there is ample opportunity for the introduction of error in the reporting of the events and behaviors of primary interest in understanding the impacts of welfare reform. This paper provides an introduction to these sources of measurement error and examines two theoretical frameworks for understanding the various sources of error. The empirical literature concerning the quality of responses for reports of earnings, transfer income, employment and unemployment, and sensitive behaviors is examined, to identify those items most likely to be subjected to response error among the welfare population. The paper concludes with suggestions for attempting to reduce the various sources of error through alternative questionnaire and survey design.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues SOURCES OF ERROR IN THE SURVEY PROCESS The various disciplines that embrace the survey method, including statistics, psychology, sociology, and economics, share a common concern with the weakness of the measurement process, the degree to which survey results deviate from “those that are the true reflections of the population” (Groves, 1989). The disciplines vary in the terminology used to describe error as well as their emphasis on understanding the impact of measurement error on analyses or the reduction of the various sources of error. The existence of these terminological differences and our desire to limit the focus of this research to measurement error suggests that a brief commentary on the various conceptual frameworks may aid in defining our interests unambiguously. One common conceptual framework is that of mean squared error, the sum of the variance and the square of the bias. Variance is the measure of the variable error associated with a particular implementation of a survey; inherent in the notion of variable error is the fundamental requirement of replication, whether over units of observation (sample units), questions, or interviewers. Bias, as used here, is defined as the type of error that affects all implementations of a survey design, a constant error, within a defined set of essential survey conditions (Hansen et al., 1961). For example, the use of a single question to obtain total family income in the Current Population Survey (CPS) has been shown to underestimate annual income by approximately 20 percent (U.S. Bureau of the Census, 1979); this consistent underestimate would be considered the extent of the bias related to a particular question for a given survey design. Another conceptual framework focuses on errors of observation as compared to errors of nonobservation (Kish, 1965). Errors of observation refer to the degree to which individual responses deviate from the true value for the measure of interest; as defined, they are the errors of interest for this research, to be referred to as measurement errors. Observational errors can arise from any of the elements directly engaged in the measurement process, including the questionnaire, the respondent, and the interviewer, as well as the characteristics that define the measurement process (e.g., the mode and method of data collection). Errors of nonobservation refer to errors related to the lack of measurement for some portion of the sample and can be classified as arising from three sources, coverage: nonresponse (both unit and item nonresponse), and sampling. Errors of nonobservation are the focus of other papers presented in this volume (see, for example, Groves and Couper, this volume). Questionnaire as Source of Measurement Error Ideally a question will convey to the respondent the meaning of interest to the researcher. However, several linguistic, structural, and environmental factors affect the interpretation of the question by the respondent. These factors include

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues the specific question wording, the structure of each question (open versus closed), and the order in which the questions are presented. Question wording is often seen as one of the major problems in survey research; although one can standardize the language read by the respondent or the interviewer, standardizing the language does not imply standardization of the meaning. In addition, a respondent’s perception of the intent or meaning of a question can be shaped by the sponsorship of the survey, the overall topic of the questionnaire, or the environment more immediate to the question of interest, such as the context of the previous question or set of questions or the specific response options associated with the question. Respondent as Source of Measurement Error Once the respondent comprehends the question, he or she must retrieve the relevant information from memory, make a judgment as to whether the retrieved information matches the requested information, and communicate a response. The retrieval process is potentially fraught with error, including errors of omission and commission. As part of the communication of the response, the respondent must determine whether he or she wishes to reveal the information. Survey instruments often ask questions about socially and personally sensitive topics. It is widely believed, and well documented, that such questions elicit patterns of underreporting (for socially undesirable behaviors and attitudes) as well as overreporting (for socially desirable behaviors and attitudes). Interviewers as Sources of Measurement Error For interviewer-administered questionnaires, interviewers may affect the measurement processes in one of several ways, including: Failure to read the question as written; Variation in interviewers’ ability to perform the other tasks associated with interviewing, for example, probing insufficient responses, selecting appropriate respondents, or recording information provided by the respondent; and Demographic and socioeconomic characteristics as well as voice characteristics that influence the behavior and responses provided by the respondent. The first two factors contribute to measurement error from a cognitive or psycho-linguistic perspective in that different respondents are exposed to different stimuli; thus variation in responses is, in part, a function of the variation in stimuli. All three factors suggest that interviewer effects contribute via an increase in variable error across interviewers. If all interviewers erred in the same direction (or their characteristics resulted in errors of the same direction and magnitude), interviewer bias would result. For the most part, the literature indicates that among

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues well-trained interviewing staff, interviewer error contributes to the overall variance of estimates as opposed to resulting in biased estimates (Lyberg and Kasprzyk, 1991). Other Essential Survey Conditions as Sources of Measurement Error Any data collection effort involves decisions concerning the features that define the overall design of the survey, here referred to as the essential survey conditions. In addition to the sample design and the wording of individual questions and response options, these decisions include: Whether to use interviewers or to collect information via some form of self-administered questionnaire; The means for selecting and training interviewers (if applicable); The mode of data collection for interviewer administration (telephone versus face to face); The choice of respondent rule, including the extent to which the design permits the reporting of information by proxy respondents; The method of data collection (paper and pencil, computer assisted); The extent to which respondents are encouraged to reference records to respond to factual questions; Whether to contact respondents for a single interview (cross-sectional design) or follow respondents over time (longitudinal or panel design); For longitudinal designs, the frequency and periodicity of measurement; The identification of the organization for whom the data are collected; and The identification of the data collection organization. No one design or set of design features is clearly superior with respect to overall data quality. For example, as noted, interviewer variance is one source of variability that obviously can be eliminated through the use of a self-administered questionnaire. However, the use of an interviewer may aid in the measurement process by providing the respondent with clarifying information or by probing insufficient responses. MEASUREMENT ERROR ASSOCIATED WITH AUTOBIOGRAPHICAL INFORMATION: THEORETICAL FRAMEWORK Three distinct literatures provide the basis for the theoretical framework underlying investigations of measurement error in surveys. These theoretical foundations come from the fields of cognitive psychology, social psychology,

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues and to a lesser extent, social linguistics.1 Although research concerning the existence, direction, magnitude as well as correlates of response error have provided insight into the factors associated with measurement error, there are few fundamental principles that inform either designers of data collection efforts or analysts of survey data as to the circumstances, either individual or design based, under which measurement error is most likely to be significant or not. Those tenets that appear to be robust across substantive areas are outlined in the following sections. Cognitive Theory Tourangeau (1984) as well as others (see Sudman et al., 1996, for a review) have categorized the survey question-and-answer process as a four-step process involving comprehension of the question, retrieval of information from memory, assessment of the correspondence between the retrieved information and the requested information, and communication. In addition, the encoding of information, a process outside the control of the survey interview, determines a priori whether the information of interest is available for the respondent to retrieve from long-term memory. Comprehension of the interview question is the “point of entry” to the response process. Does the question convey the concept(s) of interest? Is there a shared meaning among the researcher, the interviewer, and the respondent with respect to each of the words as well as the question as a whole? The comprehension of the question involves not only knowledge of the particular words and phrases used in the questionnaire, but also the respondent’s impression of the purpose of the interview, the context of the particular question, and the interviewer’s behavior in the delivery of the question. The use of simple, easily understood language is not sufficient for guaranteeing shared meaning among all respondents. Belson (1981) found that even simple terms were subject to misunderstanding. For example, Belson examined respondents’ interpretation of the following question: “For how many hours do you usually watch television on a weekday? This includes evening viewing.” He found that respondents varied in their interpretation of various terms such as “how many hours” (sometimes interpreted as requesting starting and stopping times of viewing), “you” (interpreted to include other family members), “usually,” and “watch television” (interpreted to mean being in the room in which the television is on). 1   Note that although statistical and economic theories provide the foundation for analysis of error-prone data, these disciplines provide little theoretical foundation for understanding the source of the measurement error nor the means for reducing measurement error. The discussion presented here will be limited to a review of cognitive and social psychological theories applicable to the measures of interest in understanding the welfare population.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues Much of the measurement error literature has focused on the retrieval stage of the question-answering process, classifying the lack of reporting of an event as retrieval failure on the part of the respondent, comparing the characteristics of events that are reported to those that are not reported. One of the general tenets from this literature concerns the length of the recall period; the greater the length of the recall period, the greater the expected bias due to respondent retrieval and reporting error. This relationship has been supported by empirical data investigating the reporting of consumer expenditures and earnings (Neter and Waksberg, 1964); the reporting of hospitalizations, visits to physicians, and health conditions (e.g. Cannell et al., 1965); and reports of motor vehicle accidents (Cash and Moss, 1969), crime (Murphy and Cowan, 1976); and recreational activities (Gems et al., 1982). However, even within these studies, the findings with respect to the impact of the length of recall period on the quality of survey estimates are inconsistent. For example, Dodge (1970) found that length of recall was significant in the reporting of robberies but had no effect on the reporting of various other crimes, such as assaults, burglaries, and larcenies. Contrary to theoretically justified expectations, the literature also offers several examples in which the length of the recall period had no effect on the magnitude of response errors (see, for example, Mathiowetz and Duncan, 1988; Schaeffer, 1994). These more recent investigations point to the importance of the complexity of the behavioral experience over time, as opposed to simply the passage of time, as the factor most indicative of measurement error. This finding harkens back to theoretical discussions of the impact of interference on memory (Crowder, 1976). Response errors associated with the length of the recall period typically are classified as either telescoping error, that is the tendency of the respondent to report events as occurring earlier (backward telescoping) or more recently (forward telescoping) than they actually occurred, or recall decay, the inability of the respondent to recall the relevant events occurring in the past (errors of omission). Forward telescoping is believed to dominate recall errors when the reference period for the questions is of short duration, while recall decay is more likely to have a major effect when the reference period is of long duration. In addition to the length of the recall period, the relative salience of the event affects the likelihood of either telescoping or memory decay. For example, events that are unique or that have a major impact on the respondent’s life are less likely to be forgotten (error of omission) than less important events; however, the vividness of the event may lead respondents to recall the event as occurring more recently than is true (forward telescoping). Another tenet rising from the collaborative efforts of cognitive psychologists and survey methodologists concerns the relationship between true behavioral experience and retrieval strategies undertaken by a respondent. Recent investigations suggest that the retrieval strategy undertaken by the respondent to provide a “count” of a behavior is a function of the true behavioral frequency. Research by Burton and Blair (1991) indicate that respondents choose to count events or items

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues (episodic enumeration) if the frequency of the event/item is low and they rely on estimation for more frequently occurring events. The point at which respondents switch from episodic counting to estimation varies by both the characteristics of the respondent and the characteristics of the event. As Sudman et al. (1996) note, “no studies have attempted to relate individual characteristics such as intelligence, education, or preference for cognitive complexity to the choice of counting or estimation, controlling for the number of events” (p. 201). Work by Menon (1993, 1994) suggests that it is not simply the true behavioral frequency that determines retrieval strategies, but also the degree of regularity and similarity among events. According to her hypotheses, those events that are both regular and similar (brushing teeth) require the least amount of cognitive effort to report, with respondents relying on retrieval of a rate to produce a response. Those events occurring irregularly require more cognitive effort on the part of the respondent. The impact of different retrieval strategies with respect to the magnitude and direction of measurement error is not well understood; the limited evidence suggests that errors of estimation are often unbiased, although the variance about an estimate (e.g., mean value for the population) may be large. Episodic enumeration, however, appears to lead to biased estimates of the event or item of interest, with a tendency to be biased upward for short recall periods and downward for long recall periods. A third tenet springing from this same literature concerns the salience or importance of the behavior to be retrieved. Sudman and Bradburn (1973) identify salient events as those that are unique or have continuing economic or social consequences for the respondent. Salience is hypothesized to affect the strength of the memory trace and subsequently, the effort involved in retrieving the information from long-term memory. The stronger the trace, the lower the effort needed to locate and retrieve the information. Cannell et al. (1965) report that those events judged to be important to the individual were reported more completely and accurately than other events. Mathiowetz (1986) found that short spells of unemployment were less likely to be reported than longer (i.e., more salient) spells. The last maxim concerns the impact of interference related to the occurrence of similar events over the respondent’s life or during the reference period of interest. Classical interference and information-processing theories suggest that as the number of similar or related events occurring to an individual increases, the probability of recalling any one of those events declines. An individual may lose the ability to distinguish between related events, resulting in an increase in the rate of errors or omission. Inaccuracy concerning the details of any one event also may increase as the respondent makes use of general knowledge or impressions concerning a class of events for reconstructing the specifics of a particular occurrence. Interference theory suggests that “forgetting” is a function of both the number and temporal pattern of related events in long-term memory. In addition,

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues we would speculate that interference also contributes to the misreporting of information, for example, the reporting of the receipt of Medicare benefits rather than Medicaid benefits. Social Psychology: The Issue of Social Desirability In addition to asking respondents to perform the difficult task of retrieving complex information from long-term memory, survey instruments often ask questions about socially and personally sensitive topics. Some topics are deemed, by social consensus, to be too sensitive to discuss in “polite” society. This was a much shorter list in the 1990s than in the 1950s, but most would agree that topics such as sexual practices, impotence, and bodily functions fall within this classification. Some (e.g., Tourangeau et al., 2000) hypothesize that questions concerning income also fall within this category. Other questions may concern topics that have strong positive or negative normative responses (e.g., voting, the use of pugnacious terms with respect to racial or ethnic groups) or for which there may be criminal retribution (e.g., use of illicit drugs, child abuse). The sensitivity of the behavior or attitude of interest may affect both the encoding of the information as well as the retrieval and reporting of the material; little of the survey methodological research has addressed the point at which the distortion occurs with respect to the reporting of sensitive material. Even if the respondent is able to retrieve accurate information concerning the behavior of interest, he or she may choose to edit this information at the response formation stage as a means to reduce the costs, ranging from embarrassment to potential negative consequences beyond the interview situation, associated with revealing the information. Applicability of Findings to the Measurement of Economic Phenomena One of the problems in drawing inferences from other substantive fields to that of economic phenomena is the difference in the nature of the measures of interest. Much of the assessment of the quality of household-based survey reports concerns the reporting of discrete behaviors; many of the economic measures that are the subject of inquiry with respect to the measurement of the welfare population are not necessarily discrete behaviors or even phenomena that can be linked to a discrete memory. Some of the phenomena of interest could be considered trait phenomena. Let’s consider the reporting of occupation. We speculate that the cognitive process by which one formulates a response to a query concerning current occupation is different from the process related to reporting the number of doctor visits during the past year. For other economic phenomena, we speculate that individual differences in the approach to formulating a response impact the magnitude and direction of error associated with the measurement process. Consider the reporting of current

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues earnings related to employment. For some respondents, the request to report current earnings requires little cognitive effort—it may be almost an automatic response. For these individuals, wages may be considered a characteristic of their self-identity, a trait related to how they define themselves. For other individuals, the request for information concerning current wages may require the retrieval of information from a discrete episode (the last paycheck), the retrieval of a recent report of the information (the reporting of wages in an application for a credit card), or the construction of an estimate at the time of the query based on the retrieval of information relevant to the request. Given both the theoretical and empirical research conducted within multiple branches of psychology and survey methodology, what would we anticipate are the patterns of measurement error for various economic measures? The response to that question is a function of how the respondent’s task is formulated and the very nature of the phenomena of interest. For example, asking a respondent to provide an estimate of the number of weeks of unemployment during the past year is quite different from the task of asking the respondent to report the starting and stopping dates of each unemployment spell for the past year. For individuals in a steady state (constant employment or unemployment), neither task could be considered a difficult cognitive process. For these individuals, employment or unemployment is not a discrete event but rather may become encoded in memory as a trait that defines the respondent. However, for the individual with sporadic spells of unemployment throughout the year, the response formulation process most likely would differ for the two questions. Although the response formulation process for the former task permits an estimation strategy on the part of the respondent, the latter requires the retrieval of discrete periods of unemployment. For the reporting of these discrete events, we would hypothesize that patterns of response error evident in the reporting of events in other substantive fields would be observed. With respect to social desirability, we would anticipate patterns similar to those evident in other types of behaviors: overreporting of socially desirable behaviors and underreporting of socially undesirable behaviors. Measurement Error in Household Reports of Income As noted by Moore et al. (1999), the reporting of income by household respondents in many surveys can be characterized as a two-step process: the first involving the correct enumeration of sources of household income and the second, the accurate reporting of the amount of the income for the specific source. They find that response error in the reporting of various sources and amounts of income may be due to a large extent to cognitive factors, such as “definitional issues, recall and salience problems, confusion, and sensitivity” (p. 155). We return to these cognitive factors when considering alternative means for reducing measurement error in surveys of the low-income population.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues Earnings Empirical evaluations of household-reported earnings information include the assessment of annual earnings, usual earnings (with respect to a specific pay period), most recent earnings, and hourly wage rates. These studies rely on various sources of validation data, including the use of employers’ records, administrative records, and respondents’ reports for the same reference period reported at two different times. With respect to reports of annual earnings, mean estimates appear to be subject to relatively small levels of response error, although absolute differences indicate significant overreporting and underreporting at the individual level. For example, Borus (1970) focused on survey responses of residents in low-income census tracts in Fort Wayne, Indiana. The study examined two alternative approaches to questions concerning annual earnings: (1) the use of two relatively broad questions concerning earnings, and (2) a detailed set of questions concerning work histories. Responses to survey questions were compared to data obtained from the Indiana Employment Security Division for employment earnings covered by the Indiana Unemployment Insurance Act. Borus found that the mean error in reports of annual earnings was small and insignificant for both sets of questions; however, more than 10 percent of the respondents misreported annual earnings by $1,000 (based on a mean of $2,500). Among poor persons with no college education, Borus found that the broad questions resulted in more accurate data than the work history questions. Smith (1997) examined the reports of earnings data among individuals eligible to participate in federal training programs. Similar to the work by Borus (1970), Smith compared the reports based on direct questions concerning annual earnings to those responses based on summing the report of earnings for individual jobs. The decomposition approach, that is, the reporting of earnings associated with individual jobs, led to higher reports of annual earnings, attributed to both an increase in the reporting of number of hours worked as well as an increase in the reporting of irregular earnings (overtime, tips, and commissions). Comparisons with administrative data for these individuals led Smith to conclude that the estimates based on adding up earnings across jobs led to overreporting, rather than more complete reporting.2 Duncan and Hill (1985) sampled employees from a single establishment and compared reports of annual earnings with information obtained from the employer’s records. The nature of the sample, employed persons, limits our ability 2   An alternative interpretation of the findings might suggest that the decomposition approach was more accurate and that the apparent overestimation, when compared to administrative records, is because of underreporting of income in the administrative records rather than overreporting of earnings using the decomposition method.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues to draw inferences from their work to the low-income population. Respondents were interviewed in 1983 and requested to report earnings and employment-related measures for calendar years 1981 and 1982. For neither year was the mean of the sample difference between household-based reports and company records statistically significant (8.5 percent and 7 percent of the mean, respectively), although the absolute differences for each year indicate significant underreporting and overreporting. Comparison of measures of change in annual earnings based on the household report and the employer records indicate no difference; interview reports of absolute change averaged $2,992 (or 13 percent) compared to the employer-based estimate of $3,399 (or 17 percent). Although the findings noted are based on small samples drawn from either a single geographic area (Borus) or a single firm (Duncan and Hill), the results parallel the findings from empirical research comprised of nationally representative samples. Bound and Krueger (1991) examined error in annual earnings as reported in the March, 1978 CPS. Although the error was distributed around approximately a zero mean for both men and women, the magnitude of the error was substantial. In addition to examining bias in mean estimates, the studies by Duncan and Hill and Bound and Krueger examined the relationship between measurement error and true earnings. Both studies indicate a significant negative relationship between error in reports of annual earnings and the true value of annual earnings. Similar to Duncan and Hill (1985), Bound and Krueger (1991) report positive autocorrelation (.4 for men and .1 for women) between errors in CPS-reported earnings for the 2 years of interest, 1976 and 1977. Both Duncan and Hill (1985) and Bound and Krueger (1991) explore the implications of measurement error for earnings models. Duncan and Hill’s model relates the natural logarithm of annual earnings to three measures of human capital investment: education, work experience prior to current employer, and tenure with current employer, using both the error-ridden self-reported measure of annual earnings and the record-based measure as the left-hand-side variable. A comparison of the ordinary least squares parameter estimates based on the two dependent variables suggests that measurement error in the dependent variable has a sizable impact on the parameter estimates. For example, estimates of the effects of tenure on earnings based on interview data were 25 percent lower than the effects based on record earnings data. Although the correlation between error in reports of earnings and error in reports of tenure was small (.05) and insignificant, the correlation between error in reports of earnings and actual tenure was quite strong (-.23) and highly significant, leading to attenuation in the estimated effects of tenure on earnings based on interview information. Bound and Krueger (1991) also explore the ramifications of an error-ridden left-hand-side variable by regressing error in reports of earnings with a number of human capital and demographic factors, including education, age, race, marital status, region, and standard metropolitan statistical area (SMSA). Similar to

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues to correct for the expected biases. For example, if the behavior or event of interest is expected to occur on a regular basis, a question that directs the respondent to retrieve the rule, and apply the rule to the time frame of interest, and then probes to elicit exceptions to the rule may be a good strategy for eliciting a numeric response. Current versus Retrospective Reports. Current status most often is easier to report, with respect to cognitive difficulty, than retrospective status, so it is often useful to consider beginning questions concerning current status. Information retrieved as part of the reporting of current status also will facilitate retrieval of retrospective information. REPAIRS FOCUSING ON PROBLEMS RELATED TO SOCIAL DESIRABILITY Questions for which the source of the measurement error is related to perceived sensitivity of the items or the socially undesirable nature of the response often call for the use of question items or questionnaire modes that provide the respondent within greater sense of confidentiality or even anonymity as a means for improving response quality. The questionnaire designer must gauge the level of sensitivity or threat (or elicit information on sensitivity or threat through developmental interviews or focus groups) and respond with the appropriate level of questionnaire modifications. The discussion that follows attempts to provide approaches for questions of varying degrees of sensitivity, moving from slightly sensitive to extremely sensitive or illegal behaviors. Reducing Threat Through Question Wording Sudman and Bradburn (1982) provide a checklist of question approaches to minimize threat from sensitive questions. Among the suggestions made by the authors are the use of open questions as opposed to closed questions (so as to not reveal extreme response categories), the use of longer questions so as to provide context and indicate that the subject is not taboo, the use of alternative terminology (e.g., street language for illicit drugs), and embedding the topic in a list of more threatening topics to reduce perceived threat, because threat or sensitivity is determined in part by the context. Alternative Modes of Data Collection For sensitive questions, one of the most consistent findings from the experimental literature indicates that the use of self-administered questionnaires results in higher reports of threatening behavior. For example, in studies of illicit drug use, the increase in reports of use was directly related to the perceived level of

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues sensitivity, greatest for the reporting of recent cocaine use, less profound but still significant with respect to marijuana and alcohol use. Alternative modes could involve the administration of the questions by an interviewer, with the respondent completing the response categories using paper and pencil, or administration of the questionnaire through a portable cassette and self-recording of responses. More recently, face-to-face data collection efforts have experimented with CASID in which the respondent reads the questions from the computer screen and directly enters the responses and ACASI, in which the questions can be heard over headphones as well as read by the respondent. The latter has the benefit of not requiring the respondent to be literate; furthermore, it can be programmed to permit efficient multilingual administration without requiring multilingual survey interviewers. In addition, both computer-assisted approaches offer the advantage that complicated skip patterns, not possible with paper and pencil self-administered questionnaires, can be incorporated into the questionnaire. Similar methods are possible in telephone surveys, with the use of push-button or voice recognition technology for the self-administered portion of the questionnaire. Randomized Response and Item Count Techniques Two techniques described in the literature provide researchers with a means of obtaining a population estimate of an event or a behavior but not information that can be associated with the individual. Both were designed initially for use in face-to-face surveys; it is feasible to administer an item count approach in a telephone or self-administered questionnaire. The randomized response technique is one in which two questions are presented to the respondent, each with the same response categories, usually yes and no. One question is the question of interest; the other is a question for which the distribution of the responses for the population is known. Each question is associated with a different color. A randomized device, such as a box containing beads of different colors, indicates to the respondent which of the questions to answer, for which he or she simply states to the interviewer either “yes” or “no.” The probability of selecting the red bead as opposed to the blue bead is known to the researcher. An example is as follows: A box contains 100 beads, 70 percent of which are red, 30 percent of which are blue. When shaken, the box will present to the respondent one bead (only seen by the respondent). Depending on the color, the respondent will answer one of the following questions: (Red question) Have you ever had an abortion? and (Blue question) Is your birthday in June? In a survey of 1,000 individuals, the expected number of persons answering “yes” to the question about the month of the birthday is approximately 1,000(.30)/12 or 25 persons (assuming birthdays are equally distributed over the 12 months of the year). If 200 persons said “yes” in response to answering either the red or blue questions, then 175 answered yes in response to the abortion item, yielding a population estimate of the percent of women having had an abortion as 175/(1000*.70) or 25 percent.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues The item count method is somewhat easier to administer than the randomized response technique. In the item count method, two nearly identical lists of behaviors are developed; in one list k behaviors are listed and in the other list, k+1 items are listed, where the additional item is the behavior of interest. Half of the respondents are administered the list with k items and the other half are offered the list with the k+1 behaviors. Respondents are asked to simply provide the number of behaviors in which they have engaged (without indicating the specific behaviors). The difference in the number of behaviors between the two lists provides the estimate of the behavior of interest. The major disadvantage of either the randomized response technique or item count method is that one cannot relate individual characteristics of the respondents with the behavior of interest; rather one is limited to a population estimate. CONCLUSIONS The empirical literature addressing response errors specifically among the low-income or welfare population is limited. However, if we couple those limited findings with results based on studies of the general population, some principles of questionnaire design to minimize response error emerge. At the risk of appearing to provide simple solutions to complex problems, we speculate on some guidelines to assist in the construction of questionnaires targeted at the low-income or welfare populations. Complex versus simple behavioral experience. One finding that is consistent throughout the literature indicates that complex behavioral experiences are more difficult to retrieve and report accurately than simple behavioral experiences. Despite this, questionnaire designers tend to treat all potential respondents the same, opting for a single set of questions for many questions, such as a single question or set of questions concerning annual earnings or amount of program support. One means by which to attempt to improve the reporting for those persons for whom the task is most difficult is to adopt, as suggested by Schaeffer (1994), the use of filter questions to determine the complexity of the experience, offering different follow-up questions for those with simple and complex behavior. For example, the person who has been employed continuously at a single job or unemployed continuously during a particular reference period easily can be identified and directed toward a different set of questions concerning earnings than the individual who has held several jobs, either concurrently or sequentially. Similarly, one can ask the respondent whether the amount of income from a particular income support program varies from month to month, with follow-up questions based on the response. Although this approach to questionnaire design deviates from the desire to “standardized” the measurement process, it acknowledges the need to be flexible within a standardized measurement process so as to maximize the quality of the final product.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues Simple, single-focus items often are more effective than complex, compound items. Whenever possible, a question should attempt to address a single concept. Questions that include the use of “and” or “or” or that end with exclusion or inclusion clauses often can be confusing to respondents. Although these questions often are constructed so as to minimize the number of questions read to the respondent (and therefore minimize administration time), we speculate that the use of several shorter questions is more effective, both from the perspective of administration time as well as the quality of the data. As an example, let’s return to an earlier example: Since your welfare benefits ended in (FINAL BENEFIT MONTH), did you take part for at least one month in any Adult Basic Education (ABE) classes for improving your basic reading and math skills, or GED classes to help you prepare for the GED test, or classes to prepare for a regular high school diploma? One means to improve this item would be as follows: Since (FINAL BENEFIT MONTH) have you taken any of the following classes? An Adult Basic Education class for improving basic reading and math skills? YES/NO A GED class to prepare for the GED test? YES/NO A class or classes to prepare for a regular high school diploma? YES/NO If the “one month” qualifier offered in the original question was important analytically, each “yes” response could be followed up with a probe directed at the length of the class. Reduce cognitive burden whenever possible. Regardless of the population of interest, we know that, from a cognitive perspective, some tasks are easier to perform than others. Several means by which this can be accomplished include: Phrase tasks in the form of recognition rather than free recall. For example, asking the respondent to answer the question “Did you receive income from any of the following sources?” followed by a list of income sources is easier than asking the respondent to identify all income sources for the reference period of interest. Note that in asking a recognition question such as the one described, the ideal format would be to have the respondent respond “yes/no” to each income source, so only one item needs to be processed. Request information that requires estimation rather than episodic recall. For example, asking for the total number of jobs held during the reference period of interest requires less cognitive effort than asking for the starting and ending date of each job. If the latter information is needed to address analytic needs,

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues preceding the request with an estimation question may aid the respondent’s retrieval of individual episodes. Request information in the format or metric used by the respondent. For example, earning information may be best reported when the most salient or most rehearsed metric is used by the respondent. For example, the findings by Borus (1970) and Smith (1997) that indicated a single broad-based question yielded a more accurate reporting by low-income respondents than a series of questions that required event-history type reconstruction of earnings simply may indicate that annual earnings are well rehearsed and more easily accessible to respondents than earnings related to any one job. One means by which to determine whether to ask the respondent about annual earnings, monthly earnings, or hourly earnings is to ask the respondent how he or she is best able to respond. Once again, this implies that tailoring the questionnaire to the respondent’s circumstances may result in higher quality data. Focus on reference periods that are salient to the respondent. The 6-month period prior to exiting welfare may not necessarily be a particularly salient reference period, even though the date of termination of benefits may be quite salient. For reference periods that may not be salient to the respondent, the use of calendars or other records coupled with the identification of landmark events within the reference period may aid retrieval of information and the dating of events and behaviors. Provide the respondent with assistance in how to perform the task. For the most part, respondents rarely perform the task we are asking them to tackle. Instructions and feedback throughout the process can clarify the task for the respondent as well as provide feedback for appropriate respondent behavior. Instructions indicating that the questionnaire designer is interested in all spells of unemployment, including short spells lasting less than a week, provides an instruction to the respondent as well as additional time for the respondent to search his or her memory. Should the respondent provide such information, appropriate feedback would indicate that such detailed information is important to the study. Other forms of instruction could focus the respondent on the use of a calendar or other types of records. In addition, we know from the literature that use of additional probes or cues stimulates the reporting of additional information. When there is interest in eliciting information from the respondent concerning short spells of employment or unemployment or odd or sporadic sources of income, repeated retrieval attempts by the respondent in response to repeated questions may be the most effective approach. In some cases, the provision of some information may be preferable to no information from the respondent. Consider the case in which the respondent reports “don’t know” in response to a question concerning earnings. One approach that has been effective is the use of broad-based followup questions in response to “don’t know” items, for example, asking the respondent if his or her

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues earnings were more than or less than a specific amount, with subsequent followup items until the respondent can no longer make a distinction (see Hurd and Rodgers, 1998). Comprehension. The concepts of interest for many surveys of the low-income and welfare populations are fairly complex, for example, distinguishing among the various income support programs or determining whether sporadic odd jobs count as being employed. As indicated in several of the studies reviewed, research directed toward improving the comprehension of survey questions is greatly needed. For those developing questionnaires, this implies the need for iterative testing and pretesting, focusing on the interpretation of questions among members of the population of interest. The empirical literature provides evidence of both reasonably accurate reporting of earnings, other sources of income, and employment as well as extremely poor reporting of these characteristics on the part of household respondents. The magnitude of measurement error in these reports is in part a function of the task as framed by the question. Careful questionnaire construction and thorough testing of questions and questionnaires can effectively identify question problems and reduce sources of error. REFERENCES Aquilino, W. 1994 Interview mode effects in surveys of drug and alcohol use. Public Opinion Quarterly 58:210–240. Aquilino, W., and L.LoSciuto 1990 Effect of interview mode on self-reported drug use. Public Opinion Quarterly 54:362–395. Barron, J., M.Berger, and D.Black 1997 On the Job Training. Kalamazoo, MI: W.E. Upjohn Institute for Employment Research. Belli, R. 1998 The structure of autobiographical memory and the event history calendar: Potential improvements in the quality of retrospective reports in surveys. Memory 6(4):383–406. Belson, T. 1981 The Design and Understanding of Survey Questions. Aldershot, Eng.: Gower Publishing Company. Bogen, K. 1995 Results of the Third Round of SIPP Cognitive Interviews. Unpublished manuscript, U.S. Bureau of the Census. Borus, M. 1966 Response error in survey reports of earnings information. Journal of the American Statistical Association 61:729–738. 1970 Response error and questioning technique in surveys of earnings information. Journal of the American Statistical Association 65:566–575. Bound, J., and A.Krueger 1991 The extent of measurement error in longitudinal earnings data: Do two wrongs make a right? Journal of Labor Economics 9:1–24.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues Bradburn, N. 1983 Response effects. In Handbook of Survey Research, P.Rossi, J.Wright, and A.Anderson, eds. New York: Academic Press. Burton, S., and E.Blair 1991 Task conditions, response formation processes, and response accuracy for behavioral frequency questions in surveys. Public Opinion Quarterly 55:50–79. Cannell, C., G.Fisher, and T.Bakker 1965 Reporting of hospitalization in the health interview survey. Vital and Health Statistics, Series 2, No. 6. Washington, DC: U.S. Public Health Service. Cannell, C., K.Marquis, and A.Laurent 1977 A summary of studies of interviewing methodology. Vital and Health Statistics, Series 2, No. 69. Washington, DC: U.S. Public Health Service. Cannell, C., P.Miller, and L.Oksenberg 1981 Research on interviewing techniques. In Sociological Methodology, S.Leinhardt, ed. San Francisco: Jossey-Bass. Carstensen, L., and H.Woltman 1979 Comparing earnings data from the CPS and employers’ records. In Proceedings of the Section on Social Statistics. Alexandria, VA: American Statistical Association. Cash, W., and A.Moss 1969 Optimum recall period for reporting persons injured in motor vehicle accidents. In Vital and Health Statistics, Series 2, No. 50. Washington, DC: U.S. Department of Health and Human Services. Crowder, R. 1976 Principles of Learning and Memory. Hillsdale, NJ: Lawrence Erlbaum Associates. Cynamon, M., and D.Camburn 1992 Employing a New Technique to Ask Questions on Sensitive Topics. Unpublished paper presented at the annual meeting of the National Field Directors Conference, St. Petersburg, FL, May, 1992. David, M. 1962 The validity of income reported by a sample of families who received welfare assistance during 1959. Journal of the American Statistical Association 57:680–685. Dibbs, R., A.Hale, R.Loverock, and S.Michaud 1995 Some Effects of Computer Assisted Interviewing on the Data Quality of the Survey of Labour and Income Dynamics. SLID Research Paper Series, No. 95–07. Ottawa: Statistics Canada. Dodge, R. 1970 Victim Recall Pretest. Unpublished memorandum, U.S. Bureau of the Census, Washington, DC. [Cited in R.Groves (1989).] Dreher, G. 1977 Nonrespondent characteristics and respondent accuracy in salary research. Journal of Applied Psychology 62:773–776. Duncan, G., and D.Hill 1985 An investigation of the extent and consequences of measurement error in labor-economic survey data. Journal of Labor Economics 3:508–532. Forsyth, B., and J.Lessler 1991 Cognitive laboratory methods: A taxonomy. In Measurement Error in Surveys, P.Biemer, S.Sudman, and R.M.Groves, eds. New York: John Wiley and Sons. Freedman, D., A.Thornton, D.Camburn, D.Alwin, and L.Young-DeMarco 1988 The life history calendar: A technique for collecting retrospective data. In Sociological Methodology, C.Clogg, ed. San Francisco: Jossey-Bass.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues Gems, B., D.Gosh, and R.Hitlin 1982 A recall experiment: Impact of time on recall of recreational fishing trips. In Proceedings of the Section on Survey Research Methods. Alexandria, VA: American Statistical Association. Goodreau, K., H.Oberheu, and D.Vaughan 1984 An assessment of the quality of survey reports of income from the Aid to Families with Dependent Children (AFDC) program. Journal of Business and Economic Statistics 2:179–186. Grondin, C., and S.Michaud 1994 Data quality of income data using computer assisted interview: The experience of the Canadian Survey of Labour and Income Dynamics. In Proceedings of the Section on Survey Research Methods. Alexandria, VA: American Statistical Association. Groves, R. 1989 Survey Errors and Survey Costs. New York: Wiley and Sons. Halsey, H. 1978 Validating income data: lessons from the Seattle and Denver income maintenance experiment. In Proceedings of the Survey of Income and Program Participation Workshop-Survey Research Issues in Income Measurement: Field Techniques, Questionnaire Design and Income Validation. Washington, DC: U.S. Department of Health, Education, and Welfare. Hansen, M., W.Hurwitz, and M.Bershad 1961 Measurement errors in censuses and surveys. Bulletin of the International Statistical Institute 38:359–374. Hardin, E., and G.Hershey 1960 Accuracy of employee reports on changes in pay. Journal of Applied Psychology 44:269–275. Hill, D. 1987 Response errors around the seam: Analysis of change in a panel with overlapping reference periods. Pp. 210–215 in Proceedings of the Section on Survey Research Methods. Alexandria, VA: American Statistical Association. Hoaglin, D. 1978 Household income and income reporting error in the housing allowance demand experiment. In Proceedings of the Survey of Income and Program Participation Workshop-Survey Research Issues in Income Measurement: Field Techniques, Questionnaire Design and Income Validation. Washington, DC: U.S. Department of Health, Education, and Welfare. Horvath, F. 1982 Forgotten unemployment: recall bias in retrospective data. Monthly Labor Review 105:40–43. Hurd, M., and W.Rodgers 1998 The Effects of Bracketing and Anchoring on Measurement in the Health and Retirement Survey. Institute for Social Research, University of Michigan, Ann Arbor, MI. Jones, E., and J.Forrest 1992 Underreporting of abortions in surveys of U.S. women: 1976 to 1988. Demography 29:113–126. Keating, E., D.Paterson, and C.Stone 1950 Validity of work histories obtained by interview. Journal of Applied Psychology 34:6–11. Kish, L. 1965 Survey Sampling. New York: John Wiley and Sons.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues Levine, P. 1993 CPS contemporaneous and retrospective unemployment compared. Monthly Labor Review 116:33–39. Livingston, R. 1969 Evaluation of the reporting of public assistance income in the Special Census of Dane County, Wisconsin: May 15, 1968. In Proceedings of the Ninth Workshop on Public Welfare Research and Statistics. Loftus, E., and W.Marburger 1983 Since the eruption of Mt. St. Helens, has anyone beaten you up? Improving the accuracy of retrospective reports with landmark events. Memory and Cognition 11:114–120. London, K., and L.Williams 1990 A Comparison of Abortion Underreporting in an In-Person Interview and Self-Administered Question. Unpublished paper presented at the Annual Meeting of the Population Association of America, Toronto, April. Lyberg, L., and D.Kasprzyk 1991 Data collection methods and measurement error: An Overview. In Measurement Error in Surveys, P.Biemer, S.Sudman, and R.M.Groves, eds. New York: Wiley and Sons. Marquis, K., and J.Moore 1990 Measurement errors in SIPP program reports. In Proceedings of the Annual Research Conference. Washington, DC: U.S. Bureau of the Census. Mathiowetz, N. 1986 The problem of omissions and telescoping error: New evidence from a study of unemployment. In Proceedings of the Section on Survey Research Methods. Alexandria, VA: American Statistical Association. Mathiowetz, N., and G.Duncan 1988 Out of work, out of mind: Response error in retrospective reports of unemployment. Journal of Business and Economic Statistics 6:221–229. Maynes, E. 1968 Minimizing response errors in financial data: The possibilities. Journal of the American Statistical Association 63:214–227. Mellow, W., and H.Sider 1983 Accuracy of response in labor market surveys: Evidence and implications. Journal of Labor Economics 1:331–344. Menon, G. 1993 The effects of accessibility of information in memory on judgments of behavioral frequencies. Journal of Consumer Research 20:431–440. 1994 Judgments of behavioral frequencies: Memory search and retrieval strategies. In Autobiographical Memory and the Validity of Retrospective Reports, N.Schwarz and S.Sudman, eds. New York: Springer-Verlag. Moore, J., K.Marquis, and K.Bogen 1996 The SIPP Cognitive Research Evaluation Experiment: Basic Results and Documentation. Unpublished report, U.S. Bureau of the Census, Washington, DC. Moore, J., L.Stinson, and E.Welniak 1999 Income reporting in surveys: Cognitive issues and measurement error. In Cognition and Survey Research, M.Sirken, D.J.Herrmann, S.Schechter, and R.Tourangeau, eds. New York: Wiley and Sons. Morganstern, R., and N.Bartlett 1974 The retrospective bias in unemployment reporting by sex, race, and age. Journal of the American Statistical Association 69:355–357.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues Murphy, L., and C.Cowan 1976 Effects of bounding on telescoping in the national crime survey. In Proceedings of the Social Statistics Section. Alexandria, VA: American Statistical Association. Neter, J., and J.Waksberg 1964 A study of response errors in expenditure data from household interviews. Journal of the American Statistical Association 59:18–55. Oberheu, H., and M.Ono 1975 Findings from a pilot study of current and potential public assistance recipients included in the current population survey. In Proceedings of the Social Statistics Section. Alexandria, VA: American Statistical Association. O’Reilly, J., M.Hubbard, J.Lessler, P.Biemer, and C.Turner 1994 Audio and video computer assisted self-interviewing: Preliminary tests of new technology for data collection. Journal of Official Statistics 10:197–214. Poterba, J., and L.Summers 1984 Response variation in the CPS: Caveats for the unemployment analyst. Monthly Labor Review 107:37–42. Presser, S., and J.Blair 1994 Survey pretesting: Do different methods produce different results? Sociological Methodology. San Francisco: Jossey-Bass. Presser, S., and L.Stinson 1998 Data collection mode and social desirability bias in self-reported religious attendance. American Sociological Review 63:137–145. Rodgers, W., C.Brown, and G.Duncan 1993 Errors in survey reports of earnings, hours worked, and hourly wages. Journal of the American Statistical Association 88:1208–1218. Schaeffer, N. 1994 Errors of experience: Response errors in reports about child support and their implications for questionnaire design. In Autobiographical Memory and the Validity of Retrospective Reports, N.Schwarz and S.Sudman, eds. New York: Springer-Verlag. Smith, J. 1997 Measuring Earning Levels Among the Poor: Evidence from Two Samples of JTPA Eligibles. Unpublished manuscript, University of Western Ontario. Stinson, L. 1997 The Subjective Assessment of Income and Expenses: Cognitive Test Results. Unpublished manuscript, U.S. Bureau of Labor Statistics, Washington, DC. Sudman, S., and N.Bradburn 1973 Effects of time and memory factors on response in surveys. Journal of the American Statistical Association 68:805–815. 1982 Asking Questions: A Practical Guide to Questionnaire Design. San Francisco: Jossey-Bass. Sudman, S., N.Bradburn, and N.Schwarz 1996 Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco: Jossey-Bass. Tourangeau, R., and T.Smith 1996 Asking sensitive questions: The impact of data collection mode, question format, and question context . Public Opinion Quarterly 60:275–304. Tourangeau, R., L.Rips, and K.Rasinski 2000 The Psychology of Survey Response. Cambridge, Eng.: Cambridge University Press.

OCR for page 157
Studies of Welfare Populations: Data Collection and Research Issues U.S. Bureau of the Census 1979 Vocational school experience: October, 1976. In Current Population Reports Series P-70, No. 343. Washington, DC: Department of Commerce. Warner, K. 1978 Possible increases in the underreporting of cigarette consumption. Journal of the American Statistical Association 73:314–318. Yen, W., and H.Nelson 1996 Testing the Validity of Public Assistance Surveys with Administrative Records: A Validation Study of Welfare Survey Data. Unpublished paper presented at the Annual Conference of the American Association for Public Opinion Research, May.