Click for next page ( 2


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
CHAPTER t THE SEMINAR BACKGROUND Desirable linkages between disciplines do not always develop spontaneously; deliberate efforts are often needed to encourage them. This report describes one such effort. The two primary disciplines involved were cognitive psychology and survey research, but other cognitive scientists and statistical methodologists also played important roles. The Advanced Research Seminar on Cognitive Aspects of Survey Methodology (CASM) was convened by the Committee on National Statistics (CNSTAT) with funding from the National Science Foundation. The seminar, held in St, Michaels, Maryland, on June 15-21, 1983, and a follow-up meeting held in Baltimore on January 12-14, t984, were the main elements of the CASM project, whose goal was to foster a dialogue between cognitive scientists and survey researchers and to develop ideas and plans for collaborative research. This its the report of the CASH project. The primary audience for this report consists of cognitive scientists, survey researchers, and others with substantive interests in these fields. A second audience consists of persons interested in the broad question of how to foster interdisc ~ plinary communication and collaboration. For the benefit of the latter group, Appendix B gives procedural details: it explains how the seminar and follow-up meeting were organized and conducted. The cognitive sciences are concerned with the study of such processes as understanding language, remembering and forgetting, percept ion, Judgment, and inferring causes. Because a'1 of these and other cognitive processes are important in survey research interviews, it would not be surprising to find a fairly long history of collaboration between cognit ive seient ists and survey researchers in problems of mutual interest. Strangely enough, however, until a few years ago members of the two disciplines appear to have had little contact. In recent years one begins to find some convergence between the two groups, at least in the sense of discussing problems of joint concern. In ~ 978 the Social Science Research Council (United Kingdom) and the Royal Statistical Society sponsored a one-day seminar to discuss the problems associated with the collection and interpretation of retrospective and recall data in social surveys, with participation by psychologists and social client sts as well as survey researchers ~ see Louis Moss and Harvey Goldstein, editors, ~j~L 1

OCR for page 1
2 Surveys, NFER Publishing Co ., Ltd ., 1 979 ) . In 1 980 the Committee on National Statistics convened a panel on survey measurement of subjective phenomena to explore issues related to the validity and reliability of such measured. The panel included, along with survey researchers and statisticians, one cognitive psychologist and several members from the social sciences. One of the panel's recommendations called for intellectual input from social and cognitive scientists in an extensive interdisciplinary investigation of the subjective aspects of survey questions (Recommendation 12 in the panel's summary report, Surveys of ~~je~cAv~ To - a, National Academy Press, 1981; the complete report and papers of the panel, 5Yc~e~l 8~ - s~Jec~lYe - I ~ 2 vols.], are scheduled for publication by Russell Sage Foundation in 1984~. In connection with ongoing work on the redesign of the National Crime Survey (NCS), the Bureau of Social Science Research, with support from the Bureau of the Census and the Bureau of Justice Statistics, convened a two-day workshop in September 1980 that brought together a number of cognitive scientists and survey statisticians (see Appendix C, Item 1~. The discussions at the workshop focused on factors affecting the success or failure of respondents in NCS interviews in trying to recall incidents of victimization and remember details of those incidents. A number of suggestions were made that illustrated how knowledge of cognitive processed might be applied in a survey context. There was strong agreement that survey questionnaire design could benefit from the application of ideas from cognitive sciences, and, conversely, that cognitive researchers could benefit by thinking of surveys as experiments for testing their theories. The CASH project can be regarded as a logical outgrowth of the 1980 workshop. Those who first proposed the project believed that an effective effort to construct an interdisciplinary bridge between cognitive sciences and survey research should have the following four characteristics: (~) It should attempt to develop ideas and plans for collaborative research involving cognitive scientists and survey researchers. ~ 2 ~ In addition to recall, which was the primary topic of the ~ 980 workshop, it should consider other cognitive processes that take place in survey interviews, such as comprehension and judgment. (3) A small group of experts from the two disciplines, accompanied by a few applied statisticians and representatives of other relevant fields, should meet for an extended period to further their understanding of the areas of intersection between the cognitive sciences and survey research and to stimulate ideas for relevant research. (4) Above all, participation in the project should offer potential benefits to members of both disciplines: for survey researchers, through the application of cognitive research to data collection problems; for cognitive scientists, through exploration of. the potential uses of. surveys as vehicles for cognitive research. These four criteria provided the framework for the planning and conduct of the seminar.

OCR for page 1
3 There were 22 participants in the seminar, include ng cognitive scientists, survey researchers, applied statisticians, staff of the Bureau of the Census and the National Center for Health Statistics, and staff of CNSTAT ~ see Appendix D for biographical sketches of the participants ~ . Many of the discussions were focused on a specific survey, the National Health Interview Survey (NHIS) of the National Center for Health Statistics . Pr, or to the seminar, participants had been interviewed in that survey, and two interviews with volunteer respondents were video- taped for viewing at the seminar. The remainder of this chapter can be thought of as the proceedings of the seminar . It in a synthesis of the discussions at the ~ 983 St . Michaels seminar and the 1984 follow-up meeting in Baltimore. The views of the participants and their ideas for research are organized by topic and are not attributed to specific individuals. No citations from the literature are included, although the discussions obviously drew extensively on the findings of researchers in relevant disciplines ~ see the papers in Appendix A for many pertinent references ~ . The immediate purpose of the seminar was the development of proposals for cross- disciplinary research: the format of the proceedings reflects the informal seminar format that was considered bent suited for that purpose. Chapter 2 is about outcomes. It describes research activities undertaken and research plans developed by CASH participants, individually or in small groups, after the St. Michaels meeting. Other outcomes, such as papers presented at scientific meetings or published, are also described. Two background papers that examine relationships between the cognitive sciences and survey research were prepared and distributed ahead of time to the St . Michaels seminar participants. A third paper, which looks at the role of record checks in measuring the validity of Elf-reported data in surveys and experiments, was developed from a presentation on this topic at the seminar. All three of these papers (in somewhat revised form) are included in Appendix A . As noted above, Appendix B describes the preparations for and the conduct of the seminar, Appendix C is a selected bibliography prepared for seminar partici pants, and Appendix D contains biographical sketches of the participants. INTRODUCTION During the Ix-day seminar at St. Michaels and the two-day follow-up meeting in Baltimore, a vast number of topics were broached, arguments aired, and suggestions made. It is difficult to do `Justice to the full range of the enthusiastic and wide-ranging discussions. This account attempts to strike a balance between comprehensiveness and depth. A few common threads ran through much of the discussion, and these are considered in some detail . Other ideas, though no less valuable, cannot be so easily stitched into a general pattern, and there are given more cursory treatment. This section is organized under three headings: surveys as a vehicle for cognitive research; methods for improving surveys; and issues

OCR for page 1
4 particularly relevant to the National Health Interview Survey (NHIS). Several of the recurring themes of the conference fall under each rubric. Under ache first rubric, for example, we note the issue of the barriers to the use of survey methods within the cognitive sciences. These barriers include differences in methods used for research on survey methodology and those used for research in the cognitive sciences, especially cognitive psychology. For the most part, reputable survey research is carried out with probability samples that represent broad population groups; cognitive research, by contrast, i~ typically conducted with samples of convenience drawn from the student population at a single college or university. Survey research is usually carried out in field nettings, such as the respondent ' s home, and involves everyday tasks, such as the recall of recent events; cognitive research often uses the controlled conditions and artificial tasks of the laboratory. These obvious differences in research practice reflect more basic differences in what might be termed Research culture.n Many surveys are carried out under government auspices and are concerned with pressing policy issues. Timeliness in often crucial. Most cognitive research, on the other hand, is conducted in academic settings and has theoretical rather than applied aims. Although researchers in both groups are concerned about error, they differ in their views of the sources of error and in their assessments of the relative importance of the various sources. Survey researchers typically use their sample data to make estimates for a target population and to calculate the sampling errors of those estimates. While they are well aware of and concerned about the potential effects of nonsampling (measurement ~ error on their estimates, these effects are infrequently quantified . Cognitive scientists, in contrast, make little use of the formal apparatus of sampling theory and are only secondarily concerned about the generalizability of their findings to broad populations. However, they often include quantitative measures of nonsampling error, such as reliability or validity coefficients, in presenting their results . Even when the two groups appear to share a concept--~uch as the concept of attitude--it turns out on closer examination to have different meanings within the two disciplines. And when one reflects on the important subcultures within each research community, one can easily understand the difficulties in bridging the gap between survey researchers and cognitive scientists. Differences between the practices and concepts of the survey and cognitive research communities have a number of concrete consequences for the study of topics that are of concern to both fields. Within cognitive psychology, for example, memory researchers know the items that the subjects are trying to recall, and much of their work on retrieval cues presupposes this knowledge. Survey researchers, on the other hand, do not usually know the facts that the respondents are trying to recall, and methods for providing the respondent with useful cues to retrieval under these circumstances are not well known. lathe question of how to help someone remember something without knowing what the something is has not yet received much attention within cognitive psychology, but it is a provocative one that arises from a consideration of the problems faced by survey research.

OCR for page 1
5 It would be as wrong to overstate the differences between these two research community en as to overlook them. They share many concerns, including one that encompassed most of the discussion in St. Michael and Baltimore: What are the uses and limitations of self-report data? This central methodological question transcends the boundaries of there two disciplines and is no less relevant to clinical psychologists, historians, and sociologists than to survey and cognitive researchers. A few other major themes are worth mentioning before we present a more detailed review. The seminar group identified a number of key issues concerning the limits and organization of memory that are relevant in the survey context. One general issue concerns ignorance about the distribution of. memory ability across types of events and types of people. There is little detailed knowledge of what kinds of real-world events are likely to be remembered accurately and which are likely to be forgotten. There is similar ignorance about the distribution of memory ability in the general population, which is, of course, the population of interest in most surveys. A related issue concerns the organization of memory. In informal settings, people talk about events using narrative or script-like structures to organize their accounts; there is evidence that memory is organized along similar lines. Clearly, this nnatural~ organization is quite different from the usual organization of questions in a survey questionnaire. A number of participants raised the question of how recall would be affected if interviews more closely paralleled the structure of events in memory. The clash between the organization of memory and the organization of an interview leads to another general theme--the need to understand the social psychology of the interview and the role of the interviewer. Interviews can be a frustrating experience for respondents, who may have fairly simple stories to tell but are forced to recast them to fit the terms of the questionnaire. Questions designed to prompt complete and unambiguous answers may be seen by respondents as repetitive and tedious; they may be felt as interruptions that actually impede the smooth flow of information from respondent to interviewer. The model of the interv~ewer's role that underlies much of survey research is that the interviewer should be a neutral recording device. Although this view has much to recommend it, particularly in opinion research where it is important for the interviewer not to bias the respondent, it may be less appropriate in other types of survey research. For example, some questions place a premium on accurate recall, and interviewers might help respondents by suggesting strategies [or retrieval; other questions require estimates or judgments, and there interviewers might help respondents by giving them anchors for the Judgments. The model of the interviewer as a kind of collaborator of the respondent also has implications for how an interview should be conducted. One possibility is a two-stage interview process . In the first stage, respondents would be invited to tell their stories in their own terms. In the second stage, an interviewer and ~ respondent would fill out the questionnaire together. The aims of such a procedure would be to humanize the interview situation and to reduce the frustration it can engender-- leading, it is hoped, to fuller recall and more accurate reporting.

OCR for page 1
6 Another recurring theme in the work of the seminar concerned the content of the NHIS and might be labelled psychological and cultural factors in the definitions of health and illness. Respondents who see themselves as generally healthy may be more prone than others to underreport specific conditions; at the other extreme, it is plausible that respondence with serious, chronic ailments may underreport minor complaints because their threshold for counting a condition as an illness may be raised. Cultural groups may differ in the terms they use to refer to particular illnesses, or they may label as illnesses certain conditions that do not correspond to diseases ~ n standard medical classifications. The rules for deciding whether a condition warrants medical treatment may depend on nonmedical considerations, including family roles ~ parents may decide for their children, wives for their husbanded and the mechanism for paying medical bills. These substantive issues were explored not only as interesting topics in their own right, but also as a means for improving the NHIS--the more that is known about the subjective side of health, the better information can be elicited about the objective side. SURVEYS AS A VEHICLE FOR COGNITIVE RESEARCH Cognitive researchers have neglected the survey as a research tool, and a substantial portion of the seminar discussion concerned ways to change this situation. From the point of view of cognitive psychology, survey methodology offers a number of advantages over classical laboratory methods. Well-run surveys use probability samples selected from well-defined populations, and these samples are often much larger than those used in most laboratory studies. In addition, surveys require the use of processes (such as retrieval, over very long periods of time, of information concerning naturally occurring events ~ that are difficult to simulate in the laboratory setting. Consequently, some questions of considerable interest to cognitive researchers can best be studied in the context of large-scale surveys. Collection of Survey Data on Cognitive Abilities Data from a large national probability sample would have an immediate payoff for cognitive psychologists in the study of cognitive, especially memory, abilities . A number of questions about cognition are virtually impossible to answer without benchmark data from large, representative samples. It is not known, for example, whether memory ability generally declines with age and, if so, how sharp the decline in for different types of memory. Researchers at the seminar called for a national inventory of memory and cognitive ability to remedy this gap ~ see Tulving and Press, Chapter 2~. The national inventory would administer a small set of standardized cognitive and memory tests to a probability sample of respondents, perhaps in conjunction with an ongoing health survey, such an the Health and Nutrition Examination Survey or the NHIS. A national inventory of cognitive and memory abilities would provide national norms

OCR for page 1
7 for cognitive and memory skills that are needed to address theoretical questions concerning the relationship between age and ability. The norms might have an important practical application as well: it is thought that one of the first symptoms of Alzheimer's Syndrome in the deterioration of memory; age-~pecific norms on standardized memory tents would greatly facilitate the diagnosis of this disorder. A related proposal called for a national survey on cognitive failures or slips. All of us are prey to such mnemonic mishaps an losing our car keys or forgetting appointments. The survey would ask respondents to report on a number of these everyday cognitive fal lures and would be used to develop norms for these failures. In contrast to the proposed national inventory of cognitive and memory ability, the data would be based on Elf-report- rather than standardized measures. There has already been some experience with questions on these topics in surveys of selected population groups. The data would be useful in addressing questions concerning beliefs about memory (a-topic referred to an metamemory). In conjunction with objective measures, the data could, for example, be used to determine whether older respondents believe that their memories are failing and the relationship, if any, between perceived and actual memory loss. Other Proposed Surveys Several other surveys on topics of interest to cognitive scientists were proposed at the seminar. One concerns the relationship between public history and private recollection ~ see Schuman and Converse, Chapter 2 ~ . Such a survey would examine individual interpretations of national events, like the Great Depression . It would compare the perceptions of people who lived through the events with those of people who have merely read or heard about them. It would explore the impact of prior events on the interpretation of recent events: for example, it would seek to determine whether respondents who experienced the Vietnam War interpret its lessons differently from those who have only read or heard about it. Another proposed survey would explore ~naive" economic theories, general beliefs about how the economy works. Aside from its intrinsic interest as a study of the organization of belief systems, such a survey would examine the effect of economic beliefs on individual economic behavior and, in aggregate, on the economy. The final proposal called for a survey of emo tional experience . Such a surrey would collect data on the range of emotional experiences in the general population and examine a variety of questions, such as the impact of emotion on mental and physical health and the relationship between emotions and their expression. Surveys an Cognitive Experiments The proposals described so far have in common the collection of survey data on issues relevant to cognitive researchers; this section describes how surveys can provide ~ context for experimental cognitive research.

OCR for page 1
8 Surveys can be viewed an large-scale experiments on cognitive processes--such as memory and Judgment--in relatively uncontrolled settings. Surveys have some serious drawbacks as a setting for memory experiments. One obvious problem 's the difficulty in determining in a survey whether recall is accurate, but this problem can, in many cases, be overcome . In the context of a longitudinal study, reports from later waves of the survey can be checked against presumably more accurate reports from earlier waves . Even in cro~-sectional surveys, a subsample of the respondents are often reinterviewed in an effort to control the quality of data collection; these "validation" interviews provide a basis for assessing the accuracy of recall. When respondents are interviewed more than once an part of a longitudinal design or validation procedure, it is also possible for interviewers to make observations that can then be used to assess the accuracy of the respondents' reports. Sometimes records are available to help distinguish accurate from inaccurate reporting, but the difficulties in using record checks to determine the accuracy of reporting should not be underestimated; it is easy to exaggerate the degree of Forgetting" when either the records themselves or the procedures for matching records to respondent reports are not perfectly reliable (see Marquis, Appendix A). Surveys, particularly pilot studies for surveys, often include methodological experiments as integral components, and these experiments have been underutilized by cognitive scientists as vehicles for research. Aside from opportune ties for classical experiments, surveys provide a setting for quasi-experimental studies on the impact of situational factors on cognitive processes. Information on factors affecting individual interviews could be recorded by interviewers and used to measure the effect of situational variables, such as the presence of other family members or the length of the interview, on cognitive performances, such as judgment and recall. Surveys as a Paradigm for Research One point made repeatedly during the seminar is that the interview is a theoretically interesting situation that is fundamentally different from the situations that usually confront respondents in laboratory settings. Several proposals--ranging from the very concrete to the very general-- shared the notion that laboratory methods should be used to illuminate processes found in the interview situation. One concrete proposal was for laboratory investigation of memory phenomena associated with reference periods. Survey researchers have found that underreporting is more marked for events that occur at the beginning of a reference period than it is for later events. Respondents who are asked to recall their health problems during the last year recall more problems that occurred during most recent six months than during the earlier six months. They also recall more problems for the most recent six months than respondents who are asked only about problems that arose during those six months. The effect is not limited to long reference periods--it in observed even for two-week reference periods, for which

OCR for page 1
9 underreporting is greater for events that occurred in the first week--and so appears amenable to research in laboratory settings. Another memory phenomenon observed by survey researchers but neglected by cognitive scientists is telescoping, the reporting of an event that actually occurred outside the bounds set by the reference period. Another concrete proposal for experimental research involves studying the relationship between the organization of memory and the optimal retrieval strategy. A common design in survey research requires gathering parallel information about all members of a household. Experiments could help to determine which sequence of questions producer the fullest recall--person by person, topic by topic, or some other organization. It might also be possible to give respondents some flexibility, allowing them to choose one order or the other or to switch back and forth. Such studies would address a number of intriguing theoretical questions--How are memories for everyday events organized? What strategies do respondents bee to retrieve such memories? Are some strategies more effective than others? Can reordering the questions influence the choice of strategy?--and would have practical implications for survey research as well. By the time of the Baltimore meeting, a pilot study along these lines had been conducted (see Loftus, Chapter 2) with interesting results. A more general suggestion called for laboratory research to investigate the effects of "interfering" variables on the cognitive processes that are important in the survey setting. Laboratory experiments could, for example, examine the effect of the presence of other people on retrieval or comprehension. A number of such situational factors commonly present in survey interview settings are thought to affect cognitive processes, but there has been little systematic investigation of their impact. Survey research suggests not only new phenomena for cognitive researchers to investigate, but also new methods of investigation. Since Ebbinghaus initiated the scientific study of memory in the 1880s, memory researchers have relied on a single paradigm: the experimenter has control over the information to be remembered and, in consequence, knows exactly what the subjects are trying to recall. In contrast, survey researchers typically do not know what the respondents know. A new type of memory research was suggested that would parallel the survey situation; in the new paradigm, the experimenter who structures the memory task would know only in a general way what material the subjects have 1earnede The aim of such research would be to determine whether experimenters can provide retrieval cues or suggest retrieval strategies that enhance recall without having exact knowledge of the material to be remembered. A related suggestion called for memory research concerning everyday events for which records (or some other means for validation) are available. Memories for financial transactions, for example, can often be checked against entries in a checkbook; on a college campus, memories concerning doctor visits can often be compared with campus clinic records . Not all of the proposed cognitive research concerned memory. One proposal focused on the study of judgment processes. Surveys often ask

OCR for page 1
1o respondents to make estimates, or Judgments, regarding how frequently they have engaged in a particular behavior. An NHIS supplement, for example, asks respondents how many alcoholic beverages they drink in a typical two-week period. Psychologists have long been interested in the processes by which such judgments are made. One line of investigation has demonstrated that judgments are affected by the use of "anchors, which serve as starting points for the Judgment. In answering a question about their typical drinking behavior, respondents may try to recall how much they have drunk recently. This recollection in the basis for a preliminary estimate' or anchor, which is then adjusted upwards or downwards to reflect behavior during the typical period. When the anchor is misleading or the adjustment insufficient, the final estimate will be thrown off. Can the estimates be improved by providing accurate anchoring information? Survey researchers have balked at the idea of providing information, such as anchors, that might bias the respondents (a carryover, perhaps, from opinion research settings); the research on judgment indicates that respondents may bias themselves by generating misleading anchors. A line of research was proposed in which the information provided to respondents would be varied systematically: Rome respondents would be given detailed information about the distribution of responses, some would be given information about the mean response, and come would be given no information at all. The aim of the research would be to determine whether the information provided increases the accuracy of Judgments. Summery The survey method is a research tool that cognitive psychologists have neglected. Survey data can be collected on topics of interest to cognitive psychology, such as the distribution of cognitive abilities in the general population and the intersection of public and personal history. Surveys can provide a vehicle for experimental and quasi-experimental studies on cognitive processes in relatively uncontrolled nettings. Finally, survey findings suggest new phenomena and new research paradigms for cognitive researchers to explore in laboratory nettings. IMPROVING SURVEY METHODS It survey research has much to offer the cognitive sciences, then the proposals made at the seminar indicate that the cognitive sciences also can contribute to survey methodology. The proposals are grouped into four categories: general strategies for improving survey methods; cognitive research that has special relevance to survey methodology; issues calling for further methodological research; and research tools that might be applied profitably to questions concerning survey methods.

OCR for page 1
1l General Strategies One of the suggested general strategies for improving surveys was to include methodological research an a component of every large survey. Other researchers bemoaned the "morselization~ of methodological research. It is not enough to catalogue an ever larger number of response effects in surveys; instead, research on response effects must be more systematic and quantitative--survey researchers need to know not only what the potential problems are, but also when they are likely to arise and how seriously they will bias the results. In addition, the impact of response effects or other measurement errors must be incorporated into assessments of the reliability of survey results. Statistical models have been developed to measure the likely effects of sampling error; similar models are needed to assess the impact of measurement error. One approach considered by the seminar group was to treat survey items as a selection from a population of potential items; standard errors for survey estimates would then reflect both the sampling of respondents and the sampling of items. Many of the participants were struck by how much a standardized interview differs from the acquisition of information in a normal conversation. It was proposed that survey instruments be organized to follow the same principles that work well in everyday conversation. A prerequisite, then, for the design of a survey instrument would be the study of ordinary conversations about the survey topic. This proposal was related to two broader concerns. The first in ache apparent frustration of respondents at the artificiality of the typical survey interview. Interviews that were structured more like conversations would be "humanizedn--less mechanical for both respondents and interviewers. The other broad concern is poor recall. A general hypothesis that emerged during the conference was that survey questionnaires might induce more accurate recall if their organization paralleled the organization of the experience in memory. The flow of ordinary conversation would provide a good indication of how memories for a class of. events are organized . Relevant Research from-the Cognitive Sciences As is already apparent, a number of areas of investigation from the cognitive sciences were seen as particularly relevant Deco survey methodology. One such area is research on scripts and schemata. In the cognitive sciences, most researchers share the view that the interpretation and memory of experience in governed by higher-level knowledge structures. These higher-level structures, referred to variously an scripts, frames, or schemata, codify shared knowledge about classes of things or events. For example, there may be a script for visits to the doctor, which represents general assumptions about why people go to the doctor and the sequence of events that a doctor's visit usually comprises. These scripts may vary from person to person depending on such variables as type of health care system. For example,

OCR for page 1
14 give some atypical examples ~ reinclude chiropractors ~ as well as some "non-examples" ~ nexc~ ude physical therapists" ~ . With f:3mi liar categories that have sharp' y defined boundaries, examples may be unnecessary. Sometimes visual aids can be used to reduce confusion. A supplement to the NHIS asks respondents to estimate their drinking behavior in terms of ounces. It would help to show them glasses of different sizes with the capacities labelled, but even this procedure would sti 11 leave room for other ambiguities. The abstract terms used in attitude questions raise even more dif ficulty for comprehension. Terms like asocial programs" probably mean as many different things as there are different respondents. Even for a single respondent, the same term may evoke different meanings on different occasions. One reason that question order may affect results is that earlier questions can provide an interpretive context for later ones. "Social programs" may be interpreted one way in a series of questions about waste in government and another way in a series of questions about the problems of disadvantaged people. Further, in attitude questions it may not be possible or even desirable to separate the meaning of a term from its evaluation--part of what it means to have an attitude ~ ~ to have a propensity to view the object of the attitude in a particular light. Clearly, further research is needed to determine how respondents interpret and answer attitude questions. (For some suggested research on these issues, see Tourangeau et al., Chapter 2.) Recall One of the central problems of surveys is that survey results are often no more accurate than the memories of the respondents. The question of how to improve recall was perhaps the central question of the seminar. We have already noted that the sequence of questions in a questionnaire may affect the accuracy and completeness of recall. Several specific studies were proposed to compare different question orders. With life-event histories, a topical order could be compared with a chronological order; with household interviews, such as the NHIS, a person-by-person order could be compared with a topical order. Items regarding individual events might be ordered to reflect the script for that class of events. Even when questions follow a chronological organization, it may make a difference if recall proceeds from the most recent events to the least recent rather than in the opposite order (see Loftus, Chapter 23. Question order relates to another concept from cognitive psychology, the concept of proactive inhibition. When subjects in memory experiments are asked to recall lists of related items, performance gets worse on the later lists, a phenomenon referred to as the build-up of proactive inhibition. When the items on later lists bear little similarity to those on the earlier lists, the effect disappears; the effect can also be reduced by increasing the time interval between trials. These findings suggest that research might be carried out to see whether periodic changes of topic or rest periods would promote fuller recall in interviews.

OCR for page 1
15 Many questions for further research concerned the reference period. Research indicates that events may be dated more accurately if they can be tied to some landmark event. Would it be helpful, therefore, to give respondents a warm-up period during which they would think about where they were and what they were doing during the reference period? Even it they did not recall personal landmarks during the warm-up, respondents might be encouraged to think about general spatial and temporal cues that could facilitate recall. Researchers also suggested several variations on the current definition of the reference period. A three-week period could be used (instead of the current two); all episodes would then be dated and only those in the two more recent weeks would be retained. The use of such an extended reference period might reduce underreporting, which in thought to be greater at the beginning of a reference period. In another variation, a rolling reference period would be tried; rather than reporting about a period defined by fixed dates, respondents would report about the two weeks prior to the interview. Another tack for possibly improving respondent recall involved forewarning respondents about the content of the interview. For example, with computer-assisted telephone interviewing (CATI), it would be possible to contact respondents at the beginning of the reference period. This initial contact would provide respondents with a landmark for dating events; it also would provide an opportunity to suggest strategies for improving recall (e.g., noting doctor visits on a calendar, thinking about health problems every night). Even if CATI were not being used, an advance letter could include the forewarning and suggestions [or memory aids. Another area for research proposed above, regarding the use of examples, also has implications for respondent recall; examples and lists not only illustrate the meaning of a question, but also serve to prod recall. It is unclear whether atypical members of a category are especially hard to recall or are especially memorable; the overall efficacy of different types of examples may depend on their effect on memory. The work on part-set cuing indicates that, in come canes, less is more--too many examples can inhi bit recall. Judgment Many survey questions require some judgment or estimate from respondents. H',man judgments are, of course, fallible, and it is natural to ask how they are made and how they can be improved. One suggestion was made repeatedly: experiment with giving respondents a chance to revise their initial answers, or asking them for a second estimate. Anecdotal evidence suggests that an adjusted answer or second esti mate is often more accurate. A related line of. work concerns the use of qualitative, controlled feedback, in which respondents are informed about the reasons cited by o ther respondents for making a quantitative judgment . Respondents become more confident when they hear other respondents' reasons, but not their numerical assessments. Second-chance methods might also be used with questions that rely more on memory than Judgment;

OCR for page 1
16 respondents could be asked at the end of an interview whether they had recalled additional events they had not reported earlier. Some types of judgments present special problems. Attitude questions usually require an evaluative Judgment, and little in known about how judgments are made. Is memory first searched for information about the attitude object? What is the "attitude spaces that is searched? What is retrieved? How is the information combined to produce the final judgment? The answers to these questions are simply not known. One familiar type of opinion item arks for respondents to list issues according to their importance (e.g., What are the most important problems facing our nation todays ~ . Respondents commonly omit problems that are important but not particularly salient (e.g., nuclear war). Once again, little is known about what determines the relative salience of different issues. Another type of judgment that presents special difficulty is the estimation of probability. Research has shown that probability estimates are often at odds with the dictates of probability theory and that the probabilities of rare events are often greatly overestimated . In addition, probability estimates are known to be sensitive to both the framing of questions and the type of response scale that is presented to respondents. Aside from research to improve understanding Or how respondents make particular Judgments, two general approaches to reducing respondent error were suggested for investigation. The first approach involves suggesting strategies to respondents for making the estimated. Some estimation strategies are better than others and strategies that are known to reduce error could be suggested to respondents. One proposal along these lines has already been mentioned--give respondents the mean as an anchor for their individual estimates. The second approach involves collecting and using what might be termed ancillary information about the estimate. Respondents might give their answers and then rate their confidence in them. The confidence ratings might then be incorporated into the survey results. The nature of the statistical procedure for incorporating ancillary data remains to be worked out. A further prerequisite to the use of such adjustments would be a study that assesses the correlation between confidence ratings and the accuracy of reports, perhaps using record checks to evaluate accuracy. Other ancillary measures relating to respondent error could be collected. Some of the suggestions included incorporating "lie" scales (to measure the propensity of respondents to give clearly-invalid answers: one such scale is used in the Minnesota Multiphasic Personality Inventory), "denial" scales (to measure the propensity to deny or minimize symptoms), and questions assessing item sensitivity ("Would you be embarrassed to report . . . ?~. As with confidence ratings, it would be necessary to assess the validity of the ancillary data and to develop statistical procedures for incorporating them. We simply do not know much about how respondents answer survey questions, and this ignorance was an undertone in much of the discussion about judgment. Questions that are intended to trigger memories for specific events may, in fact, elicit estimates based on more general knowledge. A study carried out by one of the participants between the

OCR for page 1
17 two meetings of the -Seminar group (see Ross, Chapter 2) indicated 'chat respondents have little more confidence in thel r answers to questions about past behaviors than in their answers to questions about the future. This finding suggests that both sets of answers may be produced through a similar process that relies more on judgment than memory. Interviewer Behavior Although much of the discussion focused on the respondent as a source of nonsampling error, the interviewer and the ~ Interview situation were also seen as potential areas for improvement in survey methods. With respect to interviewer performance, a number of points merited further study. There is much to be learned from good interviewers; several proposals were aimed at learning more about their characteristics and techniques. It is possible to observe interviewers at work by videotaping interviews or by making arrangements for interviews with ~planted" respondents. (Both of these techniques were used ~ n connection with the seminar. ~ Interviewer effectiveness can be rated on a number of dimensions, including objective performance measures--.~uch an completion rates, item nonre~pon~e rates, and editing error rates--a~ well as more subjective ones--such as warmth, voice quality, appropriateness of probes, and methods for coping with respondent fatigue. Perhaps the methods used by good interviewers could be taught to all interviewers. On the other hand, if good interviewers are born and not made, videotapes could be used to identify characteristics that might serve as criteria in selecting and hiring new interviewers. The drive to standardize interviewer behavior has left interviewers little room for discretion. One proposal called for a comparison of different levels of interviewer discretion in dealing with respondent uncertainty. Some intervi ewes would be instructed not to give any clarification to respondents, some would be given standardized instructions for giving clari fication, and some would be given the freedom to decide how much clarification to give. Interviewer discretion might reduce bins but increase interviewer variance . Both ef Sects would have to be measured and the tradeoff weighed. Interviewers could also be given some discretion in determining question order (e.g., topical versus chronological ); it would be interesting to see how interviewers would order the questions if they were free to choose. CATI systems may offer a good method for providing interviewers with some discretion; CATI questionnaires could have alternative branching structures for respondents who show a preference for one order over the other, and it would be up to the interviewer to decide which branch to follow. Several of these proposals imply a conception of the interviewer that differs sharply from the prevailing view. Rather than seeing inter- viewers as a kind of neutral recording device, they might be viewed an collaborators with the respondents, helping out in various ways. One could conduct experiments in which interviewers would -suggest strategies for retrieval and estimation, provideanchoring information for Judgments, and exercise discretion in administering interviews. Accuracy rather than absolute standardization would be the aim of such approaches.

OCR for page 1
18 The Interview Situation A variety of research issues were identified that deal with aspects of the interview situation. They range from the effect of different interviewing modes to respondent attitudes toward interviews. The issue of the mode of data collection (telephone, face-to-face, or ~elf-administered) is particularly urgent for the NHIS, which in committed to a mixed approach, with some interviews being conducted over the telephone and the rest in person. One question is how to gain the cooperation of respondents in the critical first few seconds of the telephone call. Another concerns the use of incentives: Would respondents feel committed to participating in the interview it they were sent a payment or reward in the advance letter? There questions about gaining respondent cooperation and the use of incentives relate to broader concerns about respondent motivation. Not everyone views an interview in the same light or approaches it with the same motives. Different views may be systematically related to demographic or cultural variables. Poor respondents, for example, may see the survey interview in the same terms as an intake interview for welfare--a view that richer respondents are unlikely to share. These differences among subgroups in attitudes toward the interview could be assessed in a study in which people rated the similarity of the interview situation to other situations. Another approach to subgroup differences in respondent motivation and behavior assumes that, because interviews are a familiar part of contemporary society, people have probably developed rules for appropriate interview behavior. Possible rules might include not bringing up topics unless they are first mentioned by the interviewer and not asking for clarification. Different subgroups may follow different rules. Implicit in this discussion of subgroup differences is the notion that the ways in which respondents view the interviews will affect the level of their cooperation, the amount of deliberate withholding, and the accuracy of their answers. The discussion about respondent motivation reflected concerns not only about response error but also about nonrespon~e error. One general strategy to reduce nonresponse is to explore why people answer survey questions rather than why they refuse. A number of motives were sugge~ted--the desire to be helpful, a sense of duty, the wish to present oneself in a favorable light; it is probable that different respondents cooperate for different reasons. From the point of view of survey researchers, it is not the case that all motives are equally desirable: a respondent who wants to be maximally informative in preferable to one who wants to make the best impression. Because the trend in surveys is toward longer interviews, there is considerable interest in finding ways to maintain the level of the respondent motivation over the course of an interview. It would be helpful to know how respondents' moods and attitudes change during interviews and how there changes affect data quality. It is common in laboratory experiments on memory to restrict testing to 45 minutes or an hour on the grounds that performance over longer periods may deteriorate from fatigue. Experiments to quantify the fatigue effect would provide useful data for both cognitive scientists and survey researchers.

OCR for page 1
19 One hypothesis about how motivation changes during an interview in that motivation starts out high but then wanes. At the beginning of an interview, most respondents probably start out with a broad criterion for reporting events; even if they are not sure the events are appropriate, they report them anyway. As an interview wears on and respondents learn the consequences of reporting, their criterion probably narrows. Supplements and other material toward the end of a questionnaire may, therefore, be particularly prone to underreporting. Interviewers may be susceptible to a form of the same problem and neglect to note down trivial conditions. One possible remedy for such criterion shifts is to identify all the relevant conditions or events at the start of the interview, before details are collected on any one condition or event. Respondent motivation and performance may be affected by the pace an well as the length of the interview. Various methods of changing pace could be compared (such as longer questions, rent periods, or multiple se ~ ~ ' recall. on procedures that might the interview situation. _ standardization of the interview. Earlier we noted several proposals that would examine the effects of allowing interviewers greater latitude. There were, in addition, suggestions to try tailoring questionnaires to different subcultures or to individual respondents with different scripts for dealing with a topic. In the NHIS, one version of the questionnaire might be suitable {or respondents with minor medical problems, another {or respondents with serious chronic conditions. Organizing question orders according to conversational principles would reduce the inflexibility that can result from standardization. The most radical proposal along there linen was to try allowing respondents to tell their stories before any detailed questions are asked. The interview would begin as a conversation in which respondents were asked a few general questions (e.g., about their family' health and recent medical problems); respondent and interviewer would then work together to fill out the questionnaire. A household survey like the NHIS affords some flexibility in the choice of. respondent; several researchers offered hypotheses about who ;hat respondent should be. For some purposes, the best respondent for ;he household may be the person who pays the bills; for others, it may be he "gatekeepers (e.g., the person who makes the appointments with the actor. ~ Some events may be better recalled by children--the first few xperience~ in a category are often easiest to recall. It is not always necessary to select a single respondent. On the resumption that several heads are better than one, it may be useful to eve several household members present during the interview--what one ember forgets, another may recall . On the other .. . . . . . . . ~ . . contacted, particularly for their effects on Several participants called for research increase respondent motivation by humanizing One humanizing method might be to reduce the . . . hand, household members By distract each other, reducing recall, and respondents may be less Llling to answer sensitive questions when other members of the household He present. It is possible to conduct independent interviews with Several members of a household ~ports. -~ ~ [vantage to assess the reliability of their Clearly, more research is needed to determine how to take of the [act that the NHIS is a household interview, in which

OCR for page 1
20 several personn--or combinations of persons--may serve as respondents. A start could be made by collecting ancillary data on who gave answers to which questions during an interview. Tools for Methodological Research A review of the methods proposed for carrying out methodological studies will help to summarize the discussion of methodological issues. Many of the proposed studies were conceived as experiments that would compare different question orders, levels of interviewer discretion, or respondent rules. Such split-ballot studies, in which portions of the survey sample are randomly assigned to different treatments, have a long history in survey research. Other studies were seen as quasi- experiments. In theme studies, natural variations in interview length or setting (e.E., the presence of other household members) would be measured and related to differences in the quality of the data. A number of the proposals concerned the processes by which survey questions are understood and answered. Random probes inserted immediately after a question can be used to study how respondents interpret the terms in the question. Protocol analysis can be used to investigate the processes respondents use when they answer survey questions. Respondents would think out loud as they answered questions, transcripts would be made, and those transcripts analyzed for clues as to process. Protocol analysis was seen as particularly useful in identifying strategies for answering questions requiring an estimate or judgment. A related technique is debriefing the respondent after the interview has been completed. Such postinterview debriefings can be a very useful method for understanding how respondents interpreted survey questions and for clarifying the meaning of their responses. Other proposals focused on the interview process; video tapes were seen as an invaluable tool for research on this process. Videotapes could be used to study the relationship between interviewer characteristics and techniques, on one hand, and measures of interview quality (e.g., item nonrespon~e), on the other. The effects of interactions between household members during the interview could also be explored. Participants at the seminar had themselves viewed a videotaped NHIS interview, and this experience may provide a model for future research endeavors. Videotaped interviews are clearly provocative tools that can stimulate active collaboration between cognitive scientists and survey researchers. Most of the proposed research concerned nonsampl~ng errors in surveys; several techniques were suggested for assessing the magnitude of nonsampling errors. Respondent reports can sometimes be checked against administrative records, although a number of pitfalls in record-check studies (e.g., errors in the records) can bias the results. Sometimes it is possible for interviewers to make direct observations that can provide a basis for assessing response errors, and sometimes reinterviews can be used to explore the reliability of the interview process. A final method that was proposed involved including measures of validity (such as lie scalers or measures of confidence into survey instruments . These

OCR for page 1
21 measures could be used to adjust survey estimates or they could be incorporated into estimates of total survey error. One mayor source of error in estimates is underreporting: one proposal called for the development of mathematical models to estimate the amount of underreporting; the model might embody assumptions about the incidence of events of different types and the forgetting curies for each type. A final set of proposals suggested combining several methods of research on surveys. Researchers might begin with laboratory research on judgment, for example, and then conduct ~plit-ballot experiments to compare several methods for improving the Judgments of respondents in surveys. Ethnographic studies could be used to explore variations in terminology or to determine what groups of people are excluded by survey samples. (The Census Bureau has employed similar ethnographic studies to assess undercoverage in the decennial census.) Finally, a cross- study a few families intensively. These families ~ videotaped over long periods of time; family records would be checked and direct observations made. The cross- disciplinary method would establish an upper limit on the quality of information available and could be used as the standard for asses sing the shortcomings of questionnaire data. disciplinary team could would be interviewed and ISSUES FOR THE NATIONAL HEALTH INTERVIEW SURVEY A good part of the discussion centered on issues specific to the NHIS, especially issues of content. A general question concerned how the data, especially data on conditions, are used. Several researchers noted omissions from the NHIS and called for items on emotional stress and mental illness. If the NHIS is viewed in part as a survey of attitudes, then the most serious omission is the area of conceptions of health and illness. NHIS supplements could provide answers to many questions about the subjective side of health: What conditions do people include under the headings of health, illness, and injury? What health-related conditions are regarded as nonevents? How do the schemata or scripts for one kind of health event (e.g., an injury) differ from those for other kinds (e.g., an acute illness or chronic condition)? How do different subcultures differ in the conceptions of health and illness and how do their taxonomies for illness differ? How do emotional states affect physical health? We group the issues specific to the NHIS according to content areas of the questionnaire: utilization of medical services, health conditions, and restrictions in activity. Then we turn to a single item in the questionnaire that arks respondents for an overall rating of their health. Utilization Researchers identified two major issues regarding the items on use of medical services--the process by which people decide to seek help and underreporting of utilization. A number of factors determine when

OCR for page 1
22 someone decides to seek help: among those suggested are the nature of the condition or problem, the person's view of the medical system and how he or The relates to it, and the mechanism for paying for medical care. People may ~schedule" their illnesses when psychologically convenient and they may Reek help when it is convenient or Just before it becomes especially inconvenient. The relationship between a person and the medical system may be mediated by a household gatekeeper, the person who usually calls the doctor and makes the appointments for the family. These issues are interesting in their own right, and they have implications for questions of survey methodology, especially underreporting, as well. For example, questions that focus on the decision-making process might reduce underreporting of medical utilization. They would also aid in identifying the people in household who are most knowledgeable about utilization--the decisionmakers, the gatekeepers, the people who pay the bills or fill out the insurance forms. The reference period is also relevent to the issue of underreporting. For hospitalizations, NHIS has used a 13-month reference period in some years, a 12-month reference period in others, and in a recent pretest, a 6-month period. It would be worthwhile to examine the estimated distributions of discharges by month under the different reference periods and to compare them to estimates based on hospital records. Conditions The NHIS asks respondents a series of questions concerning medical conditions. One problem with these items is the terminology itself. Respondents may know they have a problem (for example, a bad back) without knowing the appropriate medical term for it. Self-report data might be more accurate if the items asked for symptoms rather than conditions. Or a general item might be added to ask for symptoms that bother the respondents but for which they do not know the cause. Another strategy that might reduce underreporting is to allow respondents to describe each problem in their own terms before proceeding to more structured items. There are subcultural variations in health terminology and, in some cases, it may be possible for local physicians to translate folk terms ~ e . g ., "high bloods ~ into standard terminology. Even for the same individual, there may be several scripts or schemata for different types of health events ~ chronic conditions versus injuries ), and different question orders or wordings may be needed to prompt the fullest recall of different types of. conditions. If standard condition lists continue to be used, it might be easier to put them on individual cards and to group them according to conditions that tend to occur together. Respondents may find it easier to sort cards than to listen to lengthy lists, enabling them to deal with more items; grouping of conditions may facilitate retrieval. One thread running through most of these suggestions is a concern about underreporting, which can occur because a condition hen not been diagnosed, because the respondent does not recognize the term for it,

OCR for page 1
23 because the condition has been forgotten, or because the respondent in unwilling to report it. One way to estimate the amount of underreporting is to compare prevalence rates based on NHIS data with those of other surveys--such as the Health and Nutrition Examination Survey (HANES), which includes medical examinations, and the National Medical Care Utilization and Expenditure Survey (NMCUES), which incorporates checks of physician records--or with expert rankings of prevalence. It would, of course, facilitate the comparisons if a common set of conditions were used. It was proposed that the NHIS condition items be included in the HANES interview so that the relation between self-reports of conditions and medical diagnoses could be explored. Leas serious chronic conditions (e.g., sinus trouble) may be especially prone to underreporting. An experiment was suggested to compare reporting under the current methods with reporting under methods designed to enhance recall (for example, by leaving respondents a chronic conditions checklist that could be mailed in). Another thread running through this discussion was the neglect of psychological factors in health. The NHIS includes few items that assess mental health, and it does not include any of the standard scales that measure depression, stressful life events, or physical symptoms associated with stress (e.~., ~omaticization scales). Because of this omission, the NHIS data cannot be used to monitor trends in the prevalence of mental health problems or to assess the relationship between physical conditions and psychological states. Participants suggested the inclusion of more mental health items in the NHIS, subject to constraints of response burden and cost. Restricted Activity As with the utilization and condition items, there was considerable interest in the subjective side of the items concerned with restrictions in activity brought on by illness or injury and considerable concern about underreporting. One proposal was to use random probes to find out how people interpret the term "restricted activity. The present approach may fail to measure the effects of mental illness; it would be useful, therefore, to know whether respondents include mental illness when they think about "illness or inJury.n Several new approaches to the restricted activity questions were suggested, partly with a view toward reducing underreporting. For the questions on the loss of days from work, respondents could be asked first to report all days lost from work for any reason and then to say why each day was lost. Another approach would be to begin the restricted activity section with questions about normal activities during the reference period. For each activity that they normally engage in, respondents would be asked whether it was curtailed or extended during the reference period and the reason for the change. Some activities, such as reading or watching television, may increase during periods of illness. It might also be useful to broaden the scope of the restricted activity questions by asking respondents whether they had carried out their major activities during the reference

OCR for page 1
24 period with less than their customary efficiency and, if so, why the change occurred. Self-Perception of Health One item on the NHIS asks respondents to rate their overall health; no other single question provoked as much discussion at the Seminar. How do respondents make thin judgment? Part of the answer probably involves a comparison process: respondents may compare their current health with their health at other times, or they may compare themselves with other people of the came age. Judgments of overall health are no doubt influenced by objective conditions, but the influence may be limited (e.g., respondents who have successfully adjusted to long-term conditions may discount them in evaluating their health) and perceptions of objective conditions may be an much influenced by the overall Judgment as the reverse. Research on underreporting of conditions demonstrates the impact of the overall evaluation on the reporting of conditions: underreporting is greater for respondents who see themselves as healthy. Global judgments typically integrate information from several dimensions. Little is known about the subjective dimensions of health ; come multidimensional scaling studies might shed considerable light on the issue. It would not be surprising to find that the ~elf-percei~red health status item is affected by question context. The correlation between the condition items and ratings of overall health might be increased if the condition items car e first in the interview. Even if. there were a correlation under both question orders, a positivity bias might be expected, with respondents seeing themselves as healthier than their answers to the condition ~ temn would warrant. TRANSLATING IDEAS INTO ACTION The free-flowing discussions at the St. Michaels and Baltimore meetings led to many ideas about ways for cognitive scientists and survey researchers to work together to their mutual benefit. Surveys can be used to collect data of interest to cognitive scientists and can serve as a vehicle for cognitive research using larger and more heterogeneous samples than those normally used in laboratory experiments. Lee National Health Interview Surrey and other surveys might be improved by applying what cognitive scientists have already learned about comprehension, memory, and Judgment. New research studies of the cognitive processes involved in answering survey questions and conducting survey interviews should provide a basis for further improvements. The real challenge and the main goal of the CASH project was to translate some of these ideas into specific collaborative research programs and activities. Now, slightly more than one year after the St. Michaels meeting, it is evident that this is being done by CASH participants and others. The details are given in Chapter 2.