Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 99
Intuitive Judgment and the Evaluation of Evidence
Dale Grif f in
Stanford University
OCR for page 100
OCR for page 101
Intuitive Judgment and ache Evaluation of Evidence
Dale Griffin
Stanford University
What I do wash to maintain-- and it is here Mat the scientific auit;llde becomes imperative-- is that insight,
untested and unsupported. is an insufficient guarantee of truth, in spite of the fact Mat much of the most impor-
tant truth is fast suggested by its means. (Bertrand Russell, 1969, p.l6)
Intuitive judgment is often misleading. Natural processes of judgment and decision-making are
subject to systematic errors. The formal strucn~res of science-- objective measurement, statistical
evaluation and strict research design --are designed to minimize the effect of such errors. Relying only
on intuitive methods of assessing evidence may lead to faulty beliefs about the world, and may make
those beliefs difficult or impossible to change. When very important decisions are to be made, the
absence of a weD-defined formal strategy is apt to prove costly.
The systematic biases which plague our gathering and evaluation of evidence are adaptive on
an individual level, in that they increase the ease of decision-making and protect our emotional well-
being. Such benefits, however, may be purchased at a high price. In the context of beliefs and deci-
sions about national policy issues, me speedy and conflict-free resolution of uncertainty is not adaptive
when the cost is poor utilization of Me evidence.
Bertrand Russell (1969) argued that science needs both intuition and logic, the first to generate
(and appreciate) ideas and the second to evaluate their truth. But problems arise when intuitive
processes replace logic as the arbiter of troth Especially in the arena of public policy, evaluation deci-
OCR for page 102
2
signs must be based on grounds that can be defined, described, and publicly observed."
This paper examines the risks of assessing evidence by subjective judgment. Specific examples
win focus on the difficulty of assessing claims for techniques designed to enhance human performance,
especially those related to parapsychological phenomena. More generally, the themes will include why
personal experience is not a trustworthy source of evidence, why people disagree about beliefs despite
access to the same evidence, and why evidence so rarely leads to belief change. The underlying mes-
sage is ~is: The checks and balances of formal science have developed as protection against the unreli-
ability of unaided human judgment.
Overview of the analysis
4.'
Organized science can be modeled as a fonnalized extension of the ways that humans naturally
learn about the world (KeBy, 1955) 2 In order to predict and control Weir environment, people gen-
erate hypotheses about what events go together, and then gather evidence to test these hypotheses. If
the evidence seems to support the current belief, the working hypothesis is retained; o~envise it is
rejected.
~ Or in Me words of Dawes (1980, p. 68) "In a wide variety of psychological contexts, sys-
tematic decisions based on a few explicable and defensible principles are superior to intuitive
decisions-- because they work better, because they are not subject to conscious or unconscious biases
on the part of the decision maker, because they can be explicated and debated, and because their basis
can be underset by those most affected by them."
2 Since there is no single model of "formal science", my references will be to tile most consen-
sual features: experimental method and quantitative measurement and analysis. The specific contrasts
between intuition-and formal methods win involve only the feamres of extensional probability and
experimental design that can be found in standard introductory e~cpenmental texts (e.g. Carlsmi~,
EDswonh & Aronson, 1976; Freedman, Pisan~ & Pumes, 1978; Neale & Leibert, 19801.
The contrast between intuitive and scientific methods is not meant to imply that scientific pro-
cedure is always motivated by rational processes (Broad & Wade, 1982~. But scientific methods do
attempt to minimize me impact of the biases that strike laypeople and scientists alike. A good account
of current philosophic criticisms of the social sciences can be found In Fiske & Shweder (19861.
OCR for page 103
3
Science adds quantitative measurement to this process. This measurement can be explicitly
recorded and the strength of me evidence for a particular hypothesis can be objectively taRied. The key
difference between intuitive and scientific mess is that Me measurement and analysis of the
scientific investigation are publicly available, while intuitive hypothesis-testing takes place inside one
person's mind.
Recent psychological research has examined ways In which intuitive judgment departs from
fonnal models of analysis-- and in focusing on such "errors" and "biases", this research has pinpointed
some natural mechanisms of human judgment. In particular, attention has been focused on heuristic
"shortcuts" that make our judgments more efficient, and on protective defenses that maintain me emo-
tional state of He decision-maker. The first section of this paper win examine Be costs associated with
our mental shortcuts (information-processing or co=dve biases). The second section win discuss the
problems caused by our self-protective mechanisms (motivations biases).
The third section win discuss how bow these types of biases come into play when we are
caned upon to evaluate evidence that has passed through some mediator: press, TV, or government.
This source of information has special properties in that we must evaluate both the source of the evi-
dence and the quality of the evidence. In Be final section, the benefits of formal research win be
demonstrated.
OCR for page 104
4
Problems in Evaluating Evidence I: Inform~on-processing busses
The invesUgabon of costive biases in judgment has followed the tradition of the study of
perceptual illusions. Much that we know about the organization of We human visual system, for exam-
pie, comes from tile study of situations in which our eye and brain are "fooled" into seeing something
Mat is not there (Gregory, 19701. The most remarkable capacity of the human perceptual system is
that it can take in an array of ambiguous information and construct a coherent, meaningful representa-
don of the world. But we generally do not realize how subjective this cons~cdon is. Perception
seems so immediate to us that we fee! as if we are taking In a copy of the true world as it exists. Cog-
n~tive judgments have the same feeling of "true"-- it is difficult to believe mat our personal experience
does not perfectly capture the objective world.
I The systematic biases I win be discussing throughout this section operate at a basic and
I
automatic level. Controlled psychological experimentation has given us many insists into these
processes beneath our awareness and beyond our control. The conclusions of these expenments are
consistent: these processes are set up to promote efficiency and a sense of confidence. Efficient short-
cuts are set up to minimize computation and avoid paralyzing uncertainty. But the short-cuts also lead
to serious flaws In our inferential processes, and the illusions of objec~vi~,r and cercau~ty prevent us
from recognizing the need for using formal methods when the decision is important.
In the Muller-L`yer visual illusion, the presence of opposite-facing anowheads on two lines of
He same length makes one look longer man the other (see Figure I). But when we have a ruler, we
can check that they are the same length, and we believe Me formal evidence, rather than Mat of our
fallible visual system. With cognitive biases, the analogue of the ruler is not clear. Against what should
we validate our judgmental system?
OCR for page 105
s
The tradinorm! comparison: Clinical versus statistical prediction
The most common- standard against which human judgment has been measured is the efficiency
of actuarial, or statistical, prediction. In the 1950's, researchers began tO compare how well expert
intuition compared with simple statistical combining nobles in predicting mental health prognoses and
over personnel outcomes.
Typically, such studies involved giving several pieces of information-- such as personality and
aptitude test scores-- about a number of patients or job applicants tO a panel of experts. Each of these
clinical judges would give their opinion about the likely outcome of each case. The actuarial predic-
tions were obtained by a simple statistical "best fit" procedure that defined some mathematical way of
combining the pieces of information, arid determined the cutoff score that would separate "health" from
"pathology" or job "success" from "failure". The predictions from the human judges and the statistical
models were then compared win the actual outcomes.
The clinical judges involved in these studies were exceedingly confident that statistical models
based on obvious relationships could not capture the subtle strategies that they had developed over
years of personal experience. But not only were the actuarial predictions superior to the expert intui-
tions, many studies indicated "that the amount of professional training and experience of the judge
does not relate to his judgmental accuracy" (Goldberg, 1968, p. 484).
These actuarial models were not highly sophisticated mathematical formulae that went beyond
the computational power of human judges. ~stead, the simplest models were the most effective. For
example, when clinical psychologists attempted to diagnose psychotics on the basis of their MMPI
profile, simply adding up four scales (the choice of the "best fit" criterion) led to better prediction than
the expert judgment of the best of the 29 clinicians (Goldberg, 19651.
OCR for page 106
6
c -
A minor upheaval in clinical psychology occurred in reaction to Meehl~s (1955) monograph
which reviewed a number of studies demons~at~ng the superiority of objective statistical prediction to
We clinical intuition most often used to make judgments of prognosis. Meehl's review was followed by
a flood of publications illustrating mat simple prediction mesons based on me objective tabulation of
relationships were almost always superior to expert clinical intuition in diagnosing brain damage,
categorizing psychiatric patients, predicting criminal recidivism, and predicting colt ege success (e.g.
Kelly & Fiske, 1951~. These analyses of clinical judgment in We 1950's were tile first to pinpoint
many of We weaknesses of human intuition that are the subject of me first part of this paper.
The most important aspect of the cl~rucal-statistical prediction debate is Tat the clinicians
involved were very confident in their intuitive judgment. This combination of demonstrably sub-
optimal judgments and continued confidence of me judges set the stage for the two themes of the judg-
ment literature: What is wrong with human judgment? and Why don't people naturally realize the limi-
tanons of human intuitive judgment?
Another aspect of this debate Cat is still ~mponant today is the strong reaction of the pro-
ponents of human intuition. Some go so far as to define rational judgment by what humans do (Cohen,
19811. In particular, many clinicaBy-onented theorists fear Cat an emphasis on measurable outcomes
dehumanizes social science. Though Mechl was supportive of clinical work, and the point of his ard-
cle was to change the focus of clinical psychology from prediction and categonzation to Dewy, his
conclusions received vindent criticism. The problem, the critics of statistical pm~ichon argued, was
Hat clinical intuition deals with deeper holistic integrations that cannot be reflected in p~ichon equa-
dons. Many opponents of reducuon~sm even question Me validity of quantitative evaluation.
OCR for page 107
7
This position fails to understand how much we can learn about human judgment from foBow-
ing up on the observed superiority of quantitative prediction. In what pan of He decision process are
we deficient? How important are these deficiencies?
If we use formal quandtadve models as "measunng sticks", then seven areas of comparison
are suggested. First, does our ~ntui~ave choice of evidence match up to Hat of "madness" formal
models? Second, do we reuieve and combine information as wed as these models do? Third, how
accurately do we follow ~ rules of statistics when we try to evaluate the combined information?
Fourth, how wed do we learn from experience? Finally, how can we protect ourselves against these
esters?
]- Intuition versus forma] models: selecting the information
One reason for the superiority of statistical judgment is Hat it utilizes info~manon based on the
observed quantifiable relationship between the predictors and the outcome. A prediction equation sums
by identifying Hose predictors that are mean~ngfi~ in a purely statistical sense. Survival tables for
insurance companies, for example, are created by collecting information on many dimensions of poss~-
ble relevance. The obvious vanables-- sex, weight, ethnicity-- are represented, but so are others, such
as family size or income, that are chosen simply because Hey are stadsticaDy related to life span.
Humans cannot attend to and measure every part of the social or physical environment, and
cannot observe the interrelationships of every pan. Instead, we must have some method of choosing a
subset of He available data to monitor most closely. Generally, we rely on pre-exisdng thrones to
guide our attention. Our confidence In our intuition prevents us from apprec~aung the power of such
theories to determine the results of the data collection. When we attend only to confirming evidence, it
becomes very hard ~ disprove a theory.
OCR for page 108
8
The conp~r~n buss-- Even in basic cognitive processes, mere are costs to the thec~y~nven
search for information. Humans tend to learn only one schematic representation of a problem-- and
then reapply that representation in an inflexible mamer to subsequent problems (Duncker, 1945;
Luchins, 1942). Often the tendency to apply a familiar schema tO a new problem causes those win
prior training tO miss easier, more efficient ways to solve the problem. People Ming to solve logical
puzzles doggedly set Out to prove their hypothesis by searching Out confirming examples, when they
would be much more efficient if Hey would search for disconfirming examples (Wason, 1960). It
seems much more natural to search for examples Hat "fit" with He Meow being tested, Han to search
for items that would disprove the theory.
The most dramatic example of theory~nven data collection is the problem of the "self-
fillfilling prophecy" (Merton, 1948~. This phrase has now become part of popular culture and refers to
He way that our theories can actually cause others to act towards us in the way that we expect. The
classic work by Rosenthal and his colleagues on this topic is reviewed in detail ~ another paper in this
senes, and so win be touched upon only briefly.
Especially well-known is the study by Rosenthal and Jacobson (1973) entitled Pygmalion in
the Classroom. Teachers were given false information on the expected achievement of some of their
students. Based on the expectations created by this information, the teachers went on to treat the ran-
domly selected "late-bloomers" so differently that these students scored especially highly on subsequent
achievement tests. .
The standard wisdom is that such demonstrations point out the absolute necessity of employing
experimenters "blind" to the hypothesis in scientific research. When experimented know how the sub-
jects in a particular condition "should" behave, it is impossible not to give unconscious clues to the
subjects. But in everyday experience we always have some guiding theory or stereotype. We are not
OCR for page 109
9
blind to our expectations or theories about how people will behave.
Snyder (1981) has examined how people investigate theories in social situations. People who
try to determine if others are extroverted ask questions about extroverted qualities-- and discover that
most people are extroverts. People who try to determine if others are introverted ask about introverted
qualities-- and discover that most people are introverts. Men who believe that they are having a phone
conversation with an attractive woman talk in an especially friendly way. When they do this, their
unseen woman parmer responds In friendly and "attractive" ways.
Everyone is familiar with the vicious competitor who is certain that it is a "dog-eat-dog" world.
Studies of competitive games reveal that these people have beliefs about the world that cause others to
act in a way that maintains those very beliefs (Kelley & Stahelski,1970). Aggressive competitors in
these studies believed that they had to "get" their opponent before their opponents got them. Sure
enough, their opponents responded to their aggressive moves with aggressive countermoves, "proving"
the compenOve theory of human nature.
Such biases do not need to come from strong long-standing theories, they can be created
within one situation. When people observe a contestant start out with a string of correct answers, they
assimilate the rest of his or her performance to their first impression. A person who starts out wed is
judged more intelligent than a person who gets the same total number of answers correct but starts out
poorly (Jones, Rock' Shaver' Goe~als & Ward, 1968).
This research provides one answer to the question: Why do people remain confident in Me
validity of poor theories? If you hold a theory strongly and confidently, then your search for evidence
will be dominated by those attendon-gemng events mat confien your theory.
OCR for page 138
38
belief in the new transcendent physics remains.
Some recent examples of Me "vividness" criteria in media reports are me press coverage given
to me~-bending children (e.g. Defty, Washington Post, March 2, 1980) and Me tremendous attention
given the Columbus, Ohio, poltergeist (Safran, Reader's Digest, December 1984; San Francisco Ch~n-
icle, March 7, 1984, from Associate Press). Both stones developed Trough extremely unreliable per-
sonal e~cpenence (Rand), 1983; Kurtz, 1984b) and demonstrate Me way Mat personal reports fit the
requirements of the media better than caution or rigor. Experiment analysis is rarely as ~Tamanc or
newsworthy as personal reports, especially since rigorous analysis emphasizes a cautious conservative
approach. Follow-up stones on the "debunking" of these phenomena rarely receive comparable atten-
non to the first excited reports.
The public television program Nova is regarded as one of the best popular Deannen~ of
scientific affairs in any comm~cabon medium. Yet its program on ESP has been vilified by skeptics
of paranormal phenomena (Kurtz, 1984b). It tned to show both sides of Me issue-- it included dramatic
"recreations" of the most famous ESP experiments and interviews with critics of ESP who proposed
altemadve explanations of these experiments. The recreated stones were more exciting and vivify
memorable than die interviews. The enthusiasm and hopefulness of the believed was more gripping
Can the skeptics' "accenmabon of Me negative". What were the producers of Nova to do about the fact
that what made a good story also was memorable and persuasive-- even though these elements were
irrelevant to what was Due? In this case, they went for the good story.
OCR for page 139
39
Perceptual bimes and mediated information
People with strong preexisting beliefs an rarely affected by any presentation of evidence.
Instead, they manage to find some confirmation in an presentations. The "biased assimilation" of evi-
dence relevant to our beliefs is a phenomenon Mat seems obviously tme of others, but sometimes
difficult to believe in ourselves. Consider a classic social psychological study of students t perceptions
of me annual Pr~nceton-Darunouth football game. (Hastorf and Cantril, 1954~. Students from the
opposing schools watched a movie of the rough 1951 football game and were asked to carefully record
all infractions. The two groups ended up with different scorecards based on the same game. Of course,
this is not remarkable at all. We see this in sports enthusiasts and political partisans every day. But
what is worth noting is Hat the students used objective dial by trial recording techniques and they sod
saw different games if they were on different sides.
This is a clue to me reason that people cannot understand why others continue to disagree with
them, even after they have been shown the "truth". We construct our perceived world on the basis of
expectations and theones, and then we fall to mice this constructed nature of the world into account
When we tactic about the same "facts" we may not be arguing on the basis of the same construed evi-
dence. This is especially important when we are faced with interpreting mixed evidence. In abr ost all
real-world cases, evidence does not come neatly packaged as "pro" or "con", and we have to interpret
how each piece of evidence supports each side.
In a more-recent extension of this idea, social psychologists at Stanford University presented
proponents and opponents of capital punishment with some studies that purported to show that deter-
rence worked, and some studies apparently showing that capital punishment had no deterrence effect
(Lord, Ross & pepper, 1979~. They reasoned that common sense must dictate mat mixed evidence
should lead to a decrease in certainty in the beliefs of both partisan groups. But if partisans accept
OCR for page 140
40
supportive evidence at face value, critically scrutinize contradictory evidence, and constnle ambiguous
evidence according to their thrones' bow sides might actually strengthen Heir beliefs on Be basis of
me mixed evidence.
'The answer was clear In our subjects assessment of He pertinent deterrence series.
Both groups believed that the methodology that had yielded evidence supportive of
Heir view had been clearly supenor, both In its relevance and freedom from artifact, to
the methodology Hat had yielded non-suppo~ve evidence. In fact, however, the sum
jects were evaluating exactly the same designs and procedures, win only the purposed
results vaned....To put the matter more bluntly, He two opposing groups had each con-
s~ued me "box-score" vis a vis empirical evidence as 'one good study supporting my
view, and one lousy study supporting He opposite view'-- a state of affects that seem-
ingly justified He maintenance and even the strengthening of Heir particular viewpoint"
(Ross, 1986, p. 14).
This result leads to a sense of pessimism for Pose of us who Link that "truth" comes fimm the
objective scientific collection of data, and from a solid replicable base of research. Giving He same
mixed evidence to two opposing groups may drive me partisans fader apart How is ~ntellecmal and
emotional rapprochement possible?
One possible source of optimism comes from related work by Ross and his colleagues (Ross,
Lepper ~ Hubbard, 1975) in which He experimenters gave subjects false infom~adon about Heir abil-
ity on some task. After subjects built up a theory to explain this ability, the experimenters discredited
the ong~nal information, but the subjects retained a weaker form of the theory they had built up. The
only forth of debriefing that effectively abolished He (inappropnate) theory involved telling He sum
jects about He perseverance phenomenon itself. This debriefing about He actual psychological process
involved finally allowed the subjects to remove He effect of the false information. Biased assimilation
may be weakened In a similar way: when we understand that our most "objective" evaluations of evi-
dence involves such bias, we may be more able to understand Hat our opponents may are reasonable
people.
OCR for page 141
41
Another reaction to processed evidence is the perception of hostile media bias. Why should
politicians from bow ends of the spectrum believe that the media is particularly hostile to their side?
At first glance, this widespread phenomenon seems to contradict assimilative biases-- often, we don't
react to stones in the press by selectively choosing supportive evidence, instead we perceive that the
news story is deliberately slanted In favor of evidence against our side. Ross and colleagues speculated
Hat the same biasing constn~al processes are at work. A partisan has a Aged construction of the truth
that lines up with his or her beliefs, and when "evenhanded" evaluations am p~senmd' Hey seem to
stress the questionable evidence for the opposition.
Support for these speculations came from studies on the news coverage of both me 1980 and
1984 presidential election and the 1982 "heist Massacre" (Vallone, 1986; Vallone, Ross & Lepper,
1985~. These issues were chosen because there were actively involved partisans on bow sides avail-
able. The opposing parties watched clips of television news coverage. Not only did they disagree about
the validity of the facts presented, and about He likely beliefs of the producers of me program, but
Hey acted as if they saw different news clips. "Viewers of He the same 30-minute videotapes reported
that the other side had enjoyed a greater proportion of favorable facts and references, and a smaller
proportion of negative ones, than their own side" (Ross, 1986, p. Age. However, objective viewers
tended to rate the broadcasts as relatively unbiased.
These "objective" viewed were defined by the experimenters as those without personal
involvement or sing options about the issues. But the partisans themselves-- if Hey are involved in
college football, the capital punishment debate, party politics or the Arab-Israeli conflict-- claim to be
evaluating me evidence on its own meets. And In a sense they are: They evaluate He quality of He
evidence as Hey have constructed it in their mind. It is the illusion of "direct perception" mat is the
fatal battier to understanding why others disagree with us. To He extent Hat we "fin in" ambiguities
OCR for page 142
42
in the information given we can find interpretations that make the evidence fit our model. Because
scientific practice demands public definition of concepts, measures and phenomena, personal construc-
~ .
dons are minimized and meaningfi~1 debate can take place. But when we rely on casual observation
personal experience and entertaining narratives as sources of evidence, we have too much room to
create our own persuasive consnual of We evidence.
Problems in Evaluating Evidence [Y: The Effect of Formal Research
Formal research structure and quantitative analysis may not be me only, or bests route to
"understanding" problems. Often, an in-depth qualitative familiarity with a subject area is necessary to
truly grasp the nature of a problem. But in all public policy programs, a private understanding must be
followed by a public demonstraizon of the efficacy of the program.
Only quantitative analysis leads to such a demonstration, and only quanthtadve evidence will
force partisans to take the other side seriously. The effect of the acceptance of this argument can be
seen in different ways in two domains: parapsychological research, and medicine. The effect of the
rejection of this argument can be seen in the development of the human potential movement
Modern parapsychology is almost entirely an expenmen~ science, as any cursory look
Hugh its influential journals will demonstrate. Articles published in the Journal of Parapsychology or
the Journal of the Society for Psychical Research explicitly discuss the statistical assumptions and con-
trolled research design used in their studies. Most active parapsychological researchers believe that the
path to scientific acceptance lies Hugh Be adoption of rigorous experimental method.
Robert Jahn, formerly dean of eng~neenng and applied sciences at Princeton University and an
active experimenter in this field, argues that "further careful srudy of this formidable field seems
justified, but only within Be context of very well conceived and technically impeccable experiments of
OCR for page 143
43
large data-base capability, with disciplined attention to the pertinent aesthetic factors, and with more
constructive involvement of the critical community" (Jahn, 1982, quoted in Hyman, 1985, p. 4). This
attitude has not caused the traditional scientific institutions to embrace parapsychology, so what have
parapsychologists gained from it?
Parapsychologists have now amassed a large literature of experiments, and this compendium of
studies and results can now be assessed using the language of science. Discussions of the status of
parapsychological theories can be argued on the evidence: quantified, explicit evidence. As it stands,
the evidence for psychic phenomena is not convincing to most traditional scientists (Hymen, 1981).
But critical discussions of the evidence can take place on the basis of specifiable problems, and not
only on the basis of beliefs and attitudes (e.g. the exchange between Hyman and Honorton on the qual-
ity of the design and analysis of the psi ganzleld experiments, starting with Hym an, 1977; and Honor-
mn, 1979~.
In direct contrast to this progression is the attitude of the human potential movement towards
evaluation and measurement. Kurt Back (1972) titled his personal history of the human potential
movement "Beyond Words" but it could have been just as accurately called "Beyond Measurement".
He begins his book and his history with an examination of the roots of the movement in the post-war
enthusiasm for applied psychology. Academic psychologists and sociologists were anxious lo measure
the increase in efficiency that would result from group educational activities. They examined group
productivity, the solidarity and cohesion of the groups themselves, as well as the well-being of the
group members.
Few measurable changes were found, and this led the Search-oriented scientists to either lose
interest in these group phenomena or to lose interest in quantitative measurement. Many of those
involved in the group experiments-- even some of the scientist who began with clearly experimental
OCR for page 144
44
1
outlooks-- were cauBlt up in the phenomenology. the experience of the group processes.
Back describes many influendal workers in this movement who started out with keen beliefs
Hat controBed experiments And groups presses would reveal si - ficant observable effects. When
these were not forthcoming, Me believers made two claims: Be effects of group processes were too
subtle, diffuse and holistic to be measured by reductionist science, and the only evidence Cat really
mattered was subjective experience-- the individual case was Be only level of interest, and this level
could never be cape by extemal "objective" measurements.
"Believing the language of the movement, one might look for research, proof, and the
acceptability of disproof. In fact, Be followers of the movement are quote immune to
rational argument or persuasion. The experience they are seeking exists, and Be believ-
ers are happy in their closed system which shows them mat Hey alone have We
insights and emotional beliefs....Seen in this light, the history of sensitivity Wing is a
stnlggle to get beyond science" (Back, p. 204~.
f
The dangers in trying to get beyond science in an important policy area ale best described by
an example fiom surgical medicine. This example is often used in introductory statistics' classes
because it demonstrates that good research really matters in He world. It shows how opinions based
On personal experience or even uncontrolled research can cause the adoption or conirnuadon of
dangerous policies.
One treatment for severe bleeding caused by cinhosis of the liver is to send the blood through
a portacaval shunt. This operation is time-consuming and risky. Many studies (at least 50), of varying
sophistication, have been undertaken to determine if the benefits outweigh the risks. (These studies are
reviewed in Grace, Muench, and Chalmers, 1966; He statistical meaning is discussed in Freedman,
Pisan~ & Pumes, 1978~.
The message of the studies is clear: ache poorer studies exaggerate He benefits of the surgery.
Seventy-five percent of He studies without control groups (24 out of 32) were very enthusiastic about
OCR for page 145
45
Me benefits of the shunt. In the studies which had control groups which were not randomly assigned,
6790 (10 out of 15) were very enthusiastic about We benefits. But none of the studies win random
assignment to con=} and experiment groups had results ~ led to a high degree of enthusiasm.
Three of these studies showed Me shunt to have no value whosoever.
In the experiments without controls, the physicians were acc~den~y biasing me outcome by
including only the most healthy patients in the study. In the experiments with nonr~ornized consuls,
Me physicians were accidentally biasing the outcome by assigning the poorest patients to the control
group that did not receive the shunt. Only when the confound of patient heals was removed by ran-
domization was it clear that the nsicy operation was of little or no value.
Good research does matter. Even physicians, highly selected for ~ntehiga~ce and highly
trained in intuitive assessment, were misled by their daily experience. Because the formal studies were
publicly available, and because the quality of the studies could be evaluated on the basis of their exper-
~men~ method, the overall conclusions were decisive. Until the human potential movement agrees on
the importance of quantitative evaluation, it will remain split into factions based on ideologies main-
tained by personal experience.
Formal research mesons are not the only or necessarily best way to team about the true state
of Nanette. But good research is We only way to ensure that real phenomena will drive out illusions.
The story of the "discovery" of N-rays in France in 1903 reveals how even physics, Me hardest of the
hard sciences, could be led astray by subjective evaluation (Broad & Wade, 1982, p. Ilk. This "new"
fonn of X-rays made sparks brighten when viewed by Me naked eye. The best physical scientists in
France accepted this brealc~ough because Hey wanted to believe In it. It took considerable logical
and experiment effort to convince the scientific establishment that the actual phenomenon was self-
decepi~on. Good research can disconfirm theones, subjective judgment rarely does.
OCR for page 146
46
it,
In his clique of the use of poor msearch practices, Pitfalls of Human Research, Baker (1976)
points out that many flaws of nab inference can creep into scientific research. "The validity and
generaliz~bili~ of experiments can be significantly impeded by making mom explicit be pitfalls Mat
are integral to their planning...and by keeping the pitfalls in full view of researchers who conduct
experiment studies" (pp. 90-91~. While sctendsm and scientific methods are not immune to the flaws
of subjective judgment, good research is designed to minimize the Compact of these problems.
The proper use of science in public policy involves replacing a "person-onented" approach
win a "method-onented" approach (Hammond, 19781. When cndcs or supporters focus on We person
who is setting policy criteria, the debate involves the bias and motivations of the people involved. But
attempts to precisely define the variables of interest and to gamer data that relate to these variables
focus the adversarial debate on He quality of He methods used This "is sc~endficaBy defensible not
because it is flawless (it isn't), but because it is readily subject to scientific cndcism" (Hammond,
197S, p. 135~.
Intuitive Judgment and the evaluation of evidence: A spry
Personal experience seems a compelling source of evidence because it involves tile most basic
processing of information: perception, abandon, and memory storage and retrieval. Yet while we have
great confidence In me accuracy of our subjective impressions, we do not have conscious access to ~
actual processes involved. Psychological expenmenmion has revealed that we have too much
confidence In our own accuracy and objeci~vi~. Humans are designed for quick Winking rather than
accurate thinking. Quick, confident assessment of~evidence is adaptive when hesitation, uncertainty and
self-doubt have high costs. But natural shortcut me~ods are subject to systematic ears and our intros-
pechve feelings of accuracy are misleading.
OCR for page 147
47
These errors of intuitive judgment lead people to search out confimning evidence, to interpret
mixed evidence in ways that confirm their expectations, and to see meaning in chance phenomena.
This same biased processing of information makes it very difficult to change our beliefs and to under-
stand the point of view of those with opposing beliefs. These errors and biases are now well-
documented by psychologists and decision theorists, and the improvement of human judgment is of
central concern In current research. The long-tem~ response to this knowledge requites broad educa-
tional programs in basic statistical inference, and formal decision-making, such as Pose props and
examined by venous authors in Kahneman et al (1982~. Already, business schools include "de-
biasing" procedures in their programs of formal decision-making. But with the complex technological
nature of our society, most researchers believe that some instruction in how to be a better consumer of
information should start in public schools.
The immediate response should be a renewed commitment to formal slouches in deciding
important policy, and a new realization that personal experience cannot be decisive in forming such
policy. As Gilbert, Light and MosteBer (1978) point out in their review of the efficacy of social inno-
vations, only tnue experimental trials can yield knowledge that is reliable and cumulative. While for-
mal research is slow and expensive, and scientific knowledge increases by tiny increments, tile final
result is impressively useful. Perhaps most important, explicit public evidence is our best hope for
moving toward a consensus on appropriate public policy.
OCR for page 148
Footnote 2. (Indicator2 should be added to Page 51, line 11,
r=.1~2)
After preparation of this paper we learned of a possible
problem in the randomization procedures employed by the
investigator contributing the largest number (9) of Ganzield
studies to the set of 28 summarized in this section. Accordingly
we constructed Table 4a to investigate the effect on the mean and
median effect sizes of omitting all the studies conducted by this
investigator, The top half of Table 4a shows this effect when we
Omitting the
, from .28 to .26
and does not change the median effect size which remains at .32.
The lower half of Table 4a shows this effect when we consider the
10 investigators as the unite of analysis. Omitting the
investigator in question lowers the mean effect size from .23 to
.22 but- raises the median effect size from .32 to .34. It seems
clear that the questioned randomization of the 9 studies of this
investigator cannot have contributed substantially to an
inflation of the overall effect size.
consider the 28 studies, as; the unite of analysis.
9 Questioned studies ~ owers the mean effect size
Representative terms from entire chapter:
intuitive judgment