Click for next page ( 100


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 99
Intuitive Judgment and the Evaluation of Evidence Dale Grif f in Stanford University

OCR for page 99

OCR for page 99
Intuitive Judgment and ache Evaluation of Evidence Dale Griffin Stanford University What I do wash to maintain-- and it is here Mat the scientific auit;llde becomes imperative-- is that insight, untested and unsupported. is an insufficient guarantee of truth, in spite of the fact Mat much of the most impor- tant truth is fast suggested by its means. (Bertrand Russell, 1969, p.l6) Intuitive judgment is often misleading. Natural processes of judgment and decision-making are subject to systematic errors. The formal strucn~res of science-- objective measurement, statistical evaluation and strict research design --are designed to minimize the effect of such errors. Relying only on intuitive methods of assessing evidence may lead to faulty beliefs about the world, and may make those beliefs difficult or impossible to change. When very important decisions are to be made, the absence of a weD-defined formal strategy is apt to prove costly. The systematic biases which plague our gathering and evaluation of evidence are adaptive on an individual level, in that they increase the ease of decision-making and protect our emotional well- being. Such benefits, however, may be purchased at a high price. In the context of beliefs and deci- sions about national policy issues, me speedy and conflict-free resolution of uncertainty is not adaptive when the cost is poor utilization of Me evidence. Bertrand Russell (1969) argued that science needs both intuition and logic, the first to generate (and appreciate) ideas and the second to evaluate their truth. But problems arise when intuitive processes replace logic as the arbiter of troth Especially in the arena of public policy, evaluation deci-

OCR for page 99
2 signs must be based on grounds that can be defined, described, and publicly observed." This paper examines the risks of assessing evidence by subjective judgment. Specific examples win focus on the difficulty of assessing claims for techniques designed to enhance human performance, especially those related to parapsychological phenomena. More generally, the themes will include why personal experience is not a trustworthy source of evidence, why people disagree about beliefs despite access to the same evidence, and why evidence so rarely leads to belief change. The underlying mes- sage is ~is: The checks and balances of formal science have developed as protection against the unreli- ability of unaided human judgment. Overview of the analysis 4.' Organized science can be modeled as a fonnalized extension of the ways that humans naturally learn about the world (KeBy, 1955) 2 In order to predict and control Weir environment, people gen- erate hypotheses about what events go together, and then gather evidence to test these hypotheses. If the evidence seems to support the current belief, the working hypothesis is retained; o~envise it is rejected. ~ Or in Me words of Dawes (1980, p. 68) "In a wide variety of psychological contexts, sys- tematic decisions based on a few explicable and defensible principles are superior to intuitive decisions-- because they work better, because they are not subject to conscious or unconscious biases on the part of the decision maker, because they can be explicated and debated, and because their basis can be underset by those most affected by them." 2 Since there is no single model of "formal science", my references will be to tile most consen- sual features: experimental method and quantitative measurement and analysis. The specific contrasts between intuition-and formal methods win involve only the feamres of extensional probability and experimental design that can be found in standard introductory e~cpenmental texts (e.g. Carlsmi~, EDswonh & Aronson, 1976; Freedman, Pisan~ & Pumes, 1978; Neale & Leibert, 19801. The contrast between intuitive and scientific methods is not meant to imply that scientific pro- cedure is always motivated by rational processes (Broad & Wade, 1982~. But scientific methods do attempt to minimize me impact of the biases that strike laypeople and scientists alike. A good account of current philosophic criticisms of the social sciences can be found In Fiske & Shweder (19861.

OCR for page 99
3 Science adds quantitative measurement to this process. This measurement can be explicitly recorded and the strength of me evidence for a particular hypothesis can be objectively taRied. The key difference between intuitive and scientific mess is that Me measurement and analysis of the scientific investigation are publicly available, while intuitive hypothesis-testing takes place inside one person's mind. Recent psychological research has examined ways In which intuitive judgment departs from fonnal models of analysis-- and in focusing on such "errors" and "biases", this research has pinpointed some natural mechanisms of human judgment. In particular, attention has been focused on heuristic "shortcuts" that make our judgments more efficient, and on protective defenses that maintain me emo- tional state of He decision-maker. The first section of this paper win examine Be costs associated with our mental shortcuts (information-processing or co=dve biases). The second section win discuss the problems caused by our self-protective mechanisms (motivations biases). The third section win discuss how bow these types of biases come into play when we are caned upon to evaluate evidence that has passed through some mediator: press, TV, or government. This source of information has special properties in that we must evaluate both the source of the evi- dence and the quality of the evidence. In Be final section, the benefits of formal research win be demonstrated.

OCR for page 99
4 Problems in Evaluating Evidence I: Inform~on-processing busses The invesUgabon of costive biases in judgment has followed the tradition of the study of perceptual illusions. Much that we know about the organization of We human visual system, for exam- pie, comes from tile study of situations in which our eye and brain are "fooled" into seeing something Mat is not there (Gregory, 19701. The most remarkable capacity of the human perceptual system is that it can take in an array of ambiguous information and construct a coherent, meaningful representa- don of the world. But we generally do not realize how subjective this cons~cdon is. Perception seems so immediate to us that we fee! as if we are taking In a copy of the true world as it exists. Cog- n~tive judgments have the same feeling of "true"-- it is difficult to believe mat our personal experience does not perfectly capture the objective world. I The systematic biases I win be discussing throughout this section operate at a basic and I automatic level. Controlled psychological experimentation has given us many insists into these processes beneath our awareness and beyond our control. The conclusions of these expenments are consistent: these processes are set up to promote efficiency and a sense of confidence. Efficient short- cuts are set up to minimize computation and avoid paralyzing uncertainty. But the short-cuts also lead to serious flaws In our inferential processes, and the illusions of objec~vi~,r and cercau~ty prevent us from recognizing the need for using formal methods when the decision is important. In the Muller-L`yer visual illusion, the presence of opposite-facing anowheads on two lines of He same length makes one look longer man the other (see Figure I). But when we have a ruler, we can check that they are the same length, and we believe Me formal evidence, rather than Mat of our fallible visual system. With cognitive biases, the analogue of the ruler is not clear. Against what should we validate our judgmental system?

OCR for page 99
s The tradinorm! comparison: Clinical versus statistical prediction The most common- standard against which human judgment has been measured is the efficiency of actuarial, or statistical, prediction. In the 1950's, researchers began tO compare how well expert intuition compared with simple statistical combining nobles in predicting mental health prognoses and over personnel outcomes. Typically, such studies involved giving several pieces of information-- such as personality and aptitude test scores-- about a number of patients or job applicants tO a panel of experts. Each of these clinical judges would give their opinion about the likely outcome of each case. The actuarial predic- tions were obtained by a simple statistical "best fit" procedure that defined some mathematical way of combining the pieces of information, arid determined the cutoff score that would separate "health" from "pathology" or job "success" from "failure". The predictions from the human judges and the statistical models were then compared win the actual outcomes. The clinical judges involved in these studies were exceedingly confident that statistical models based on obvious relationships could not capture the subtle strategies that they had developed over years of personal experience. But not only were the actuarial predictions superior to the expert intui- tions, many studies indicated "that the amount of professional training and experience of the judge does not relate to his judgmental accuracy" (Goldberg, 1968, p. 484). These actuarial models were not highly sophisticated mathematical formulae that went beyond the computational power of human judges. ~stead, the simplest models were the most effective. For example, when clinical psychologists attempted to diagnose psychotics on the basis of their MMPI profile, simply adding up four scales (the choice of the "best fit" criterion) led to better prediction than the expert judgment of the best of the 29 clinicians (Goldberg, 19651.

OCR for page 99
6 c - A minor upheaval in clinical psychology occurred in reaction to Meehl~s (1955) monograph which reviewed a number of studies demons~at~ng the superiority of objective statistical prediction to We clinical intuition most often used to make judgments of prognosis. Meehl's review was followed by a flood of publications illustrating mat simple prediction mesons based on me objective tabulation of relationships were almost always superior to expert clinical intuition in diagnosing brain damage, categorizing psychiatric patients, predicting criminal recidivism, and predicting colt ege success (e.g. Kelly & Fiske, 1951~. These analyses of clinical judgment in We 1950's were tile first to pinpoint many of We weaknesses of human intuition that are the subject of me first part of this paper. The most important aspect of the cl~rucal-statistical prediction debate is Tat the clinicians involved were very confident in their intuitive judgment. This combination of demonstrably sub- optimal judgments and continued confidence of me judges set the stage for the two themes of the judg- ment literature: What is wrong with human judgment? and Why don't people naturally realize the limi- tanons of human intuitive judgment? Another aspect of this debate Cat is still ~mponant today is the strong reaction of the pro- ponents of human intuition. Some go so far as to define rational judgment by what humans do (Cohen, 19811. In particular, many clinicaBy-onented theorists fear Cat an emphasis on measurable outcomes dehumanizes social science. Though Mechl was supportive of clinical work, and the point of his ard- cle was to change the focus of clinical psychology from prediction and categonzation to Dewy, his conclusions received vindent criticism. The problem, the critics of statistical pm~ichon argued, was Hat clinical intuition deals with deeper holistic integrations that cannot be reflected in p~ichon equa- dons. Many opponents of reducuon~sm even question Me validity of quantitative evaluation.

OCR for page 99
7 This position fails to understand how much we can learn about human judgment from foBow- ing up on the observed superiority of quantitative prediction. In what pan of He decision process are we deficient? How important are these deficiencies? If we use formal quandtadve models as "measunng sticks", then seven areas of comparison are suggested. First, does our ~ntui~ave choice of evidence match up to Hat of "madness" formal models? Second, do we reuieve and combine information as wed as these models do? Third, how accurately do we follow ~ rules of statistics when we try to evaluate the combined information? Fourth, how wed do we learn from experience? Finally, how can we protect ourselves against these esters? ]- Intuition versus forma] models: selecting the information One reason for the superiority of statistical judgment is Hat it utilizes info~manon based on the observed quantifiable relationship between the predictors and the outcome. A prediction equation sums by identifying Hose predictors that are mean~ngfi~ in a purely statistical sense. Survival tables for insurance companies, for example, are created by collecting information on many dimensions of poss~- ble relevance. The obvious vanables-- sex, weight, ethnicity-- are represented, but so are others, such as family size or income, that are chosen simply because Hey are stadsticaDy related to life span. Humans cannot attend to and measure every part of the social or physical environment, and cannot observe the interrelationships of every pan. Instead, we must have some method of choosing a subset of He available data to monitor most closely. Generally, we rely on pre-exisdng thrones to guide our attention. Our confidence In our intuition prevents us from apprec~aung the power of such theories to determine the results of the data collection. When we attend only to confirming evidence, it becomes very hard ~ disprove a theory.

OCR for page 99
8 The conp~r~n buss-- Even in basic cognitive processes, mere are costs to the thec~y~nven search for information. Humans tend to learn only one schematic representation of a problem-- and then reapply that representation in an inflexible mamer to subsequent problems (Duncker, 1945; Luchins, 1942). Often the tendency to apply a familiar schema tO a new problem causes those win prior training tO miss easier, more efficient ways to solve the problem. People Ming to solve logical puzzles doggedly set Out to prove their hypothesis by searching Out confirming examples, when they would be much more efficient if Hey would search for disconfirming examples (Wason, 1960). It seems much more natural to search for examples Hat "fit" with He Meow being tested, Han to search for items that would disprove the theory. The most dramatic example of theory~nven data collection is the problem of the "self- fillfilling prophecy" (Merton, 1948~. This phrase has now become part of popular culture and refers to He way that our theories can actually cause others to act towards us in the way that we expect. The classic work by Rosenthal and his colleagues on this topic is reviewed in detail ~ another paper in this senes, and so win be touched upon only briefly. Especially well-known is the study by Rosenthal and Jacobson (1973) entitled Pygmalion in the Classroom. Teachers were given false information on the expected achievement of some of their students. Based on the expectations created by this information, the teachers went on to treat the ran- domly selected "late-bloomers" so differently that these students scored especially highly on subsequent achievement tests. . The standard wisdom is that such demonstrations point out the absolute necessity of employing experimenters "blind" to the hypothesis in scientific research. When experimented know how the sub- jects in a particular condition "should" behave, it is impossible not to give unconscious clues to the subjects. But in everyday experience we always have some guiding theory or stereotype. We are not

OCR for page 99
9 blind to our expectations or theories about how people will behave. Snyder (1981) has examined how people investigate theories in social situations. People who try to determine if others are extroverted ask questions about extroverted qualities-- and discover that most people are extroverts. People who try to determine if others are introverted ask about introverted qualities-- and discover that most people are introverts. Men who believe that they are having a phone conversation with an attractive woman talk in an especially friendly way. When they do this, their unseen woman parmer responds In friendly and "attractive" ways. Everyone is familiar with the vicious competitor who is certain that it is a "dog-eat-dog" world. Studies of competitive games reveal that these people have beliefs about the world that cause others to act in a way that maintains those very beliefs (Kelley & Stahelski,1970). Aggressive competitors in these studies believed that they had to "get" their opponent before their opponents got them. Sure enough, their opponents responded to their aggressive moves with aggressive countermoves, "proving" the compenOve theory of human nature. Such biases do not need to come from strong long-standing theories, they can be created within one situation. When people observe a contestant start out with a string of correct answers, they assimilate the rest of his or her performance to their first impression. A person who starts out wed is judged more intelligent than a person who gets the same total number of answers correct but starts out poorly (Jones, Rock' Shaver' Goe~als & Ward, 1968). This research provides one answer to the question: Why do people remain confident in Me validity of poor theories? If you hold a theory strongly and confidently, then your search for evidence will be dominated by those attendon-gemng events mat confien your theory.

OCR for page 99
38 belief in the new transcendent physics remains. Some recent examples of Me "vividness" criteria in media reports are me press coverage given to me~-bending children (e.g. Defty, Washington Post, March 2, 1980) and Me tremendous attention given the Columbus, Ohio, poltergeist (Safran, Reader's Digest, December 1984; San Francisco Ch~n- icle, March 7, 1984, from Associate Press). Both stones developed Trough extremely unreliable per- sonal e~cpenence (Rand), 1983; Kurtz, 1984b) and demonstrate Me way Mat personal reports fit the requirements of the media better than caution or rigor. Experiment analysis is rarely as ~Tamanc or newsworthy as personal reports, especially since rigorous analysis emphasizes a cautious conservative approach. Follow-up stones on the "debunking" of these phenomena rarely receive comparable atten- non to the first excited reports. The public television program Nova is regarded as one of the best popular Deannen~ of scientific affairs in any comm~cabon medium. Yet its program on ESP has been vilified by skeptics of paranormal phenomena (Kurtz, 1984b). It tned to show both sides of Me issue-- it included dramatic "recreations" of the most famous ESP experiments and interviews with critics of ESP who proposed altemadve explanations of these experiments. The recreated stones were more exciting and vivify memorable than die interviews. The enthusiasm and hopefulness of the believed was more gripping Can the skeptics' "accenmabon of Me negative". What were the producers of Nova to do about the fact that what made a good story also was memorable and persuasive-- even though these elements were irrelevant to what was Due? In this case, they went for the good story.

OCR for page 99
39 Perceptual bimes and mediated information People with strong preexisting beliefs an rarely affected by any presentation of evidence. Instead, they manage to find some confirmation in an presentations. The "biased assimilation" of evi- dence relevant to our beliefs is a phenomenon Mat seems obviously tme of others, but sometimes difficult to believe in ourselves. Consider a classic social psychological study of students t perceptions of me annual Pr~nceton-Darunouth football game. (Hastorf and Cantril, 1954~. Students from the opposing schools watched a movie of the rough 1951 football game and were asked to carefully record all infractions. The two groups ended up with different scorecards based on the same game. Of course, this is not remarkable at all. We see this in sports enthusiasts and political partisans every day. But what is worth noting is Hat the students used objective dial by trial recording techniques and they sod saw different games if they were on different sides. This is a clue to me reason that people cannot understand why others continue to disagree with them, even after they have been shown the "truth". We construct our perceived world on the basis of expectations and theones, and then we fall to mice this constructed nature of the world into account When we tactic about the same "facts" we may not be arguing on the basis of the same construed evi- dence. This is especially important when we are faced with interpreting mixed evidence. In abr ost all real-world cases, evidence does not come neatly packaged as "pro" or "con", and we have to interpret how each piece of evidence supports each side. In a more-recent extension of this idea, social psychologists at Stanford University presented proponents and opponents of capital punishment with some studies that purported to show that deter- rence worked, and some studies apparently showing that capital punishment had no deterrence effect (Lord, Ross & pepper, 1979~. They reasoned that common sense must dictate mat mixed evidence should lead to a decrease in certainty in the beliefs of both partisan groups. But if partisans accept

OCR for page 99
40 supportive evidence at face value, critically scrutinize contradictory evidence, and constnle ambiguous evidence according to their thrones' bow sides might actually strengthen Heir beliefs on Be basis of me mixed evidence. 'The answer was clear In our subjects assessment of He pertinent deterrence series. Both groups believed that the methodology that had yielded evidence supportive of Heir view had been clearly supenor, both In its relevance and freedom from artifact, to the methodology Hat had yielded non-suppo~ve evidence. In fact, however, the sum jects were evaluating exactly the same designs and procedures, win only the purposed results vaned....To put the matter more bluntly, He two opposing groups had each con- s~ued me "box-score" vis a vis empirical evidence as 'one good study supporting my view, and one lousy study supporting He opposite view'-- a state of affects that seem- ingly justified He maintenance and even the strengthening of Heir particular viewpoint" (Ross, 1986, p. 14). This result leads to a sense of pessimism for Pose of us who Link that "truth" comes fimm the objective scientific collection of data, and from a solid replicable base of research. Giving He same mixed evidence to two opposing groups may drive me partisans fader apart How is ~ntellecmal and emotional rapprochement possible? One possible source of optimism comes from related work by Ross and his colleagues (Ross, Lepper ~ Hubbard, 1975) in which He experimenters gave subjects false infom~adon about Heir abil- ity on some task. After subjects built up a theory to explain this ability, the experimenters discredited the ong~nal information, but the subjects retained a weaker form of the theory they had built up. The only forth of debriefing that effectively abolished He (inappropnate) theory involved telling He sum jects about He perseverance phenomenon itself. This debriefing about He actual psychological process involved finally allowed the subjects to remove He effect of the false information. Biased assimilation may be weakened In a similar way: when we understand that our most "objective" evaluations of evi- dence involves such bias, we may be more able to understand Hat our opponents may are reasonable people.

OCR for page 99
41 Another reaction to processed evidence is the perception of hostile media bias. Why should politicians from bow ends of the spectrum believe that the media is particularly hostile to their side? At first glance, this widespread phenomenon seems to contradict assimilative biases-- often, we don't react to stones in the press by selectively choosing supportive evidence, instead we perceive that the news story is deliberately slanted In favor of evidence against our side. Ross and colleagues speculated Hat the same biasing constn~al processes are at work. A partisan has a Aged construction of the truth that lines up with his or her beliefs, and when "evenhanded" evaluations am p~senmd' Hey seem to stress the questionable evidence for the opposition. Support for these speculations came from studies on the news coverage of both me 1980 and 1984 presidential election and the 1982 "heist Massacre" (Vallone, 1986; Vallone, Ross & Lepper, 1985~. These issues were chosen because there were actively involved partisans on bow sides avail- able. The opposing parties watched clips of television news coverage. Not only did they disagree about the validity of the facts presented, and about He likely beliefs of the producers of me program, but Hey acted as if they saw different news clips. "Viewers of He the same 30-minute videotapes reported that the other side had enjoyed a greater proportion of favorable facts and references, and a smaller proportion of negative ones, than their own side" (Ross, 1986, p. Age. However, objective viewers tended to rate the broadcasts as relatively unbiased. These "objective" viewed were defined by the experimenters as those without personal involvement or sing options about the issues. But the partisans themselves-- if Hey are involved in college football, the capital punishment debate, party politics or the Arab-Israeli conflict-- claim to be evaluating me evidence on its own meets. And In a sense they are: They evaluate He quality of He evidence as Hey have constructed it in their mind. It is the illusion of "direct perception" mat is the fatal battier to understanding why others disagree with us. To He extent Hat we "fin in" ambiguities

OCR for page 99
42 in the information given we can find interpretations that make the evidence fit our model. Because scientific practice demands public definition of concepts, measures and phenomena, personal construc- ~ . dons are minimized and meaningfi~1 debate can take place. But when we rely on casual observation personal experience and entertaining narratives as sources of evidence, we have too much room to create our own persuasive consnual of We evidence. Problems in Evaluating Evidence [Y: The Effect of Formal Research Formal research structure and quantitative analysis may not be me only, or bests route to "understanding" problems. Often, an in-depth qualitative familiarity with a subject area is necessary to truly grasp the nature of a problem. But in all public policy programs, a private understanding must be followed by a public demonstraizon of the efficacy of the program. Only quantitative analysis leads to such a demonstration, and only quanthtadve evidence will force partisans to take the other side seriously. The effect of the acceptance of this argument can be seen in different ways in two domains: parapsychological research, and medicine. The effect of the rejection of this argument can be seen in the development of the human potential movement Modern parapsychology is almost entirely an expenmen~ science, as any cursory look Hugh its influential journals will demonstrate. Articles published in the Journal of Parapsychology or the Journal of the Society for Psychical Research explicitly discuss the statistical assumptions and con- trolled research design used in their studies. Most active parapsychological researchers believe that the path to scientific acceptance lies Hugh Be adoption of rigorous experimental method. Robert Jahn, formerly dean of eng~neenng and applied sciences at Princeton University and an active experimenter in this field, argues that "further careful srudy of this formidable field seems justified, but only within Be context of very well conceived and technically impeccable experiments of

OCR for page 99
43 large data-base capability, with disciplined attention to the pertinent aesthetic factors, and with more constructive involvement of the critical community" (Jahn, 1982, quoted in Hyman, 1985, p. 4). This attitude has not caused the traditional scientific institutions to embrace parapsychology, so what have parapsychologists gained from it? Parapsychologists have now amassed a large literature of experiments, and this compendium of studies and results can now be assessed using the language of science. Discussions of the status of parapsychological theories can be argued on the evidence: quantified, explicit evidence. As it stands, the evidence for psychic phenomena is not convincing to most traditional scientists (Hymen, 1981). But critical discussions of the evidence can take place on the basis of specifiable problems, and not only on the basis of beliefs and attitudes (e.g. the exchange between Hyman and Honorton on the qual- ity of the design and analysis of the psi ganzleld experiments, starting with Hym an, 1977; and Honor- mn, 1979~. In direct contrast to this progression is the attitude of the human potential movement towards evaluation and measurement. Kurt Back (1972) titled his personal history of the human potential movement "Beyond Words" but it could have been just as accurately called "Beyond Measurement". He begins his book and his history with an examination of the roots of the movement in the post-war enthusiasm for applied psychology. Academic psychologists and sociologists were anxious lo measure the increase in efficiency that would result from group educational activities. They examined group productivity, the solidarity and cohesion of the groups themselves, as well as the well-being of the group members. Few measurable changes were found, and this led the Search-oriented scientists to either lose interest in these group phenomena or to lose interest in quantitative measurement. Many of those involved in the group experiments-- even some of the scientist who began with clearly experimental

OCR for page 99
44 1 outlooks-- were cauBlt up in the phenomenology. the experience of the group processes. Back describes many influendal workers in this movement who started out with keen beliefs Hat controBed experiments And groups presses would reveal si - ficant observable effects. When these were not forthcoming, Me believers made two claims: Be effects of group processes were too subtle, diffuse and holistic to be measured by reductionist science, and the only evidence Cat really mattered was subjective experience-- the individual case was Be only level of interest, and this level could never be cape by extemal "objective" measurements. "Believing the language of the movement, one might look for research, proof, and the acceptability of disproof. In fact, Be followers of the movement are quote immune to rational argument or persuasion. The experience they are seeking exists, and Be believ- ers are happy in their closed system which shows them mat Hey alone have We insights and emotional beliefs....Seen in this light, the history of sensitivity Wing is a stnlggle to get beyond science" (Back, p. 204~. f The dangers in trying to get beyond science in an important policy area ale best described by an example fiom surgical medicine. This example is often used in introductory statistics' classes because it demonstrates that good research really matters in He world. It shows how opinions based On personal experience or even uncontrolled research can cause the adoption or conirnuadon of dangerous policies. One treatment for severe bleeding caused by cinhosis of the liver is to send the blood through a portacaval shunt. This operation is time-consuming and risky. Many studies (at least 50), of varying sophistication, have been undertaken to determine if the benefits outweigh the risks. (These studies are reviewed in Grace, Muench, and Chalmers, 1966; He statistical meaning is discussed in Freedman, Pisan~ & Pumes, 1978~. The message of the studies is clear: ache poorer studies exaggerate He benefits of the surgery. Seventy-five percent of He studies without control groups (24 out of 32) were very enthusiastic about

OCR for page 99
45 Me benefits of the shunt. In the studies which had control groups which were not randomly assigned, 6790 (10 out of 15) were very enthusiastic about We benefits. But none of the studies win random assignment to con=} and experiment groups had results ~ led to a high degree of enthusiasm. Three of these studies showed Me shunt to have no value whosoever. In the experiments without controls, the physicians were acc~den~y biasing me outcome by including only the most healthy patients in the study. In the experiments with nonr~ornized consuls, Me physicians were accidentally biasing the outcome by assigning the poorest patients to the control group that did not receive the shunt. Only when the confound of patient heals was removed by ran- domization was it clear that the nsicy operation was of little or no value. Good research does matter. Even physicians, highly selected for ~ntehiga~ce and highly trained in intuitive assessment, were misled by their daily experience. Because the formal studies were publicly available, and because the quality of the studies could be evaluated on the basis of their exper- ~men~ method, the overall conclusions were decisive. Until the human potential movement agrees on the importance of quantitative evaluation, it will remain split into factions based on ideologies main- tained by personal experience. Formal research mesons are not the only or necessarily best way to team about the true state of Nanette. But good research is We only way to ensure that real phenomena will drive out illusions. The story of the "discovery" of N-rays in France in 1903 reveals how even physics, Me hardest of the hard sciences, could be led astray by subjective evaluation (Broad & Wade, 1982, p. Ilk. This "new" fonn of X-rays made sparks brighten when viewed by Me naked eye. The best physical scientists in France accepted this brealc~ough because Hey wanted to believe In it. It took considerable logical and experiment effort to convince the scientific establishment that the actual phenomenon was self- decepi~on. Good research can disconfirm theones, subjective judgment rarely does.

OCR for page 99
46 it, In his clique of the use of poor msearch practices, Pitfalls of Human Research, Baker (1976) points out that many flaws of nab inference can creep into scientific research. "The validity and generaliz~bili~ of experiments can be significantly impeded by making mom explicit be pitfalls Mat are integral to their planning...and by keeping the pitfalls in full view of researchers who conduct experiment studies" (pp. 90-91~. While sctendsm and scientific methods are not immune to the flaws of subjective judgment, good research is designed to minimize the Compact of these problems. The proper use of science in public policy involves replacing a "person-onented" approach win a "method-onented" approach (Hammond, 19781. When cndcs or supporters focus on We person who is setting policy criteria, the debate involves the bias and motivations of the people involved. But attempts to precisely define the variables of interest and to gamer data that relate to these variables focus the adversarial debate on He quality of He methods used This "is sc~endficaBy defensible not because it is flawless (it isn't), but because it is readily subject to scientific cndcism" (Hammond, 197S, p. 135~. Intuitive Judgment and the evaluation of evidence: A spry Personal experience seems a compelling source of evidence because it involves tile most basic processing of information: perception, abandon, and memory storage and retrieval. Yet while we have great confidence In me accuracy of our subjective impressions, we do not have conscious access to ~ actual processes involved. Psychological expenmenmion has revealed that we have too much confidence In our own accuracy and objeci~vi~. Humans are designed for quick Winking rather than accurate thinking. Quick, confident assessment of~evidence is adaptive when hesitation, uncertainty and self-doubt have high costs. But natural shortcut me~ods are subject to systematic ears and our intros- pechve feelings of accuracy are misleading.

OCR for page 99
47 These errors of intuitive judgment lead people to search out confimning evidence, to interpret mixed evidence in ways that confirm their expectations, and to see meaning in chance phenomena. This same biased processing of information makes it very difficult to change our beliefs and to under- stand the point of view of those with opposing beliefs. These errors and biases are now well- documented by psychologists and decision theorists, and the improvement of human judgment is of central concern In current research. The long-tem~ response to this knowledge requites broad educa- tional programs in basic statistical inference, and formal decision-making, such as Pose props and examined by venous authors in Kahneman et al (1982~. Already, business schools include "de- biasing" procedures in their programs of formal decision-making. But with the complex technological nature of our society, most researchers believe that some instruction in how to be a better consumer of information should start in public schools. The immediate response should be a renewed commitment to formal slouches in deciding important policy, and a new realization that personal experience cannot be decisive in forming such policy. As Gilbert, Light and MosteBer (1978) point out in their review of the efficacy of social inno- vations, only tnue experimental trials can yield knowledge that is reliable and cumulative. While for- mal research is slow and expensive, and scientific knowledge increases by tiny increments, tile final result is impressively useful. Perhaps most important, explicit public evidence is our best hope for moving toward a consensus on appropriate public policy.

OCR for page 99
Footnote 2. (Indicator2 should be added to Page 51, line 11, r=.1~2) After preparation of this paper we learned of a possible problem in the randomization procedures employed by the investigator contributing the largest number (9) of Ganzield studies to the set of 28 summarized in this section. Accordingly we constructed Table 4a to investigate the effect on the mean and median effect sizes of omitting all the studies conducted by this investigator, The top half of Table 4a shows this effect when we Omitting the , from .28 to .26 and does not change the median effect size which remains at .32. The lower half of Table 4a shows this effect when we consider the 10 investigators as the unite of analysis. Omitting the investigator in question lowers the mean effect size from .23 to .22 but- raises the median effect size from .32 to .34. It seems clear that the questioned randomization of the 9 studies of this investigator cannot have contributed substantially to an inflation of the overall effect size. consider the 28 studies, as; the unite of analysis. 9 Questioned studies ~ owers the mean effect size