Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
9 Paranormal Phenomena BACKGROUND The primary purpose of this chapter is to evaluate the scientific evidence on parapsychological techniques in selected areas. A more complete understanding of the topic, however, requires that we provide background on the military's interest in these phenomena and treat the conceptual issue of how people come to believe as they do. This background section includes a discussion of the phenomena and the military's interest in them as well as an overview of the committee's focus. A brief examination of the different kinds of justifications for the claims is followed by a more detailed treatment of the evidence in areas that have produced large literatures: remote viewing, random number generators' and what are called Ganzfeld (whole visual field) experiments. In addition, we describe experimental work that the committee actually witnessed by visiting a parapsychological laboratory. Despite the growing scientific tradition in some of these areas, many people continue to rely on qualitative or experiential evidence to support their beliefs; we discuss the problems associated with qualitative evidence in conjunction with the research on cognitive and emotional biases, which is reviewed in the paper by Dale Griffin (Appendix B). Finally, the chapter summarizes the committee's . . mayor cone uslons. THE NATURE OF THE PHENOMENA Parapsychologists divide psi the term applied to all psychic phenom- ena into two broad categories: extrasensory perception (ESP) and 169
170 psychokinesis (PK). Included in ESP are telepathy, precognition, and clairvoyance, all of which refer to methods of gathering information about objects or thoughts without the intervention of known sensory mecha- nisms. Popularly called mind over matter, PK refers to the influence of thoughts upon objects without the intervention of known physical proc ENHANCING HUMAN PERFORMANCE esses. A presentation to the committee by several military officers described in some detail the results of experiments in remote viewing carried out at both SRI International and the Engineering Anomalies Research Laboratory at Princeton University. In these experiments subjects are said to have more or less accurately described a geographical location being visited by a target team. Although the human subjects have no way of normally knowing the target location, the examples recounted appear to indicate, at first glance, some striking correspondences between their descriptions and the actual sites. These studies have been related by some persons to reported out-of-body experiences. The presentation included discussion of psychic mind-altering tech- niques, the levitation claims of transcendental meditation groups, psy- chotronic weapons, psychic metal bending, dowsing, thought photogra- phy, and bioenergy transfer. It was indicated that the Soviet Union is far ahead of the United States in developing potential applications of such paranormal phenomena, in particular psychically controlling and influ- encing minds at a distance. At the presentation, personal accounts were given of spoon-bending parties, in which participants believe they have caused cutlery to bend with the power of their minds, as well as instances of self-hypnosis to control pain and cure illness, walking barefoot on fire and handling hot coals without being burned, leaving one's body at will, and bursting clouds by psychic means. The media and popular publications, especially in recent years, have discussed various aspects of psychic warfare. Three recent books, by Ebon (1983), McRae (1984), and Targ and Harary (1984), have attempted to document Soviet and American efforts to develop military and intel- ligence applications of alleged paranormal phenomena. These accounts have been augmented by newspaper stories, magazine articles, and television programs. Many of these sources acknowledge the speculative nature of the proposed applications, but others report that some of the techniques already exist and work. The claimed phenomena and applications range from the incredible to the outrageously incredible. The "antimissile time warp," for example, is supposed to somehow deflect attack by nuclear warheads so that they ~ ~ +~ time ~nc1 r~rpl~l~. among the ancient dinosaurs, thereby leaving us unharmed but destroying many dinosaurs (and, presumably, some of our evolutionary ancestors). Other psychotropic weapons, such - W lil Ll ~ll~ll ~LI111- ~-''- ~-~
PARANORMAL PHENOMENA 171 as the "hyperspatial nuclear howitzer," are claimed to have equally bizarre capabilities. Many of the sources cite the claim that Soviet psychotropic weapons were responsible for the 1976 outbreak of Legion- naires, disease, as well as the 1963 sinking of the nuclear submarine Thresher. POTENTIAL MILITARY APPLICATIONS Some people, including some military decision makers, can imagine potential military applications of the two broad categories of psychic phenomena. In their view, ESP, if real and controllable, could be used for intelligence gathering and, because it includes "precognition," ESP could also be used to anticipate the actions of an enemy. It is believed that PK, if realizable, might be used to jam enemy computers, prematurely trigger nuclear weapons, and incapacitate weapons and vehicles. More specific applications envisioned involve behavior modification; inducing sickness, disorientation, or even death in a distant enemy; communicating with submarines; planting thoughts in individuals without their knowledge; hypnotizing individuals at a distance; psychotropic weapons of various kinds; psychic shields to protect sensitive information or military instal- lations; and the like. One suggested application is a conception of the "First Earth Battalion," made up of "warrior monks," who will have mastered almost all the techniques under consideration by the committee, including the use of ESP, leaving their bodies at will, levitating, psychic healing, and walking through walls. THE COMMITTEE S FOCUS Although such colorful examples provide the context for our agenda, the cumulative body of data in the discipline of parapsychology enables us to judge the degree to which paranormal claims should be taken seriously. Since 1882 reports of both naturally occurring incidents and phenomena in laboratory settings have been accumulated in journals, monographs, and books. Just to survey the reports in the refereed journals of parapsychology would be an enormous undertaking. As scientists, our inclination is, of course, to restrict ourselves to the evidence that purports to be scientific. But the alleged phenomena that have apparently gained most attention and that have apparently convinced many proponents do not come from the parapsychological laboratory. Nothing approaching a scientific literature supports the claims for psychotropic weaponry, psychic metal bending, out-of-body experiences, and other potential applications supported by many proponents. The phenomena are real and important in the minds of proponents, so
172 EI4HANClNC HUMAN PERFORMANCE we attempt to evaluate them fairly. Although we cannot rely solely on a scientific data base to evaluate the claims, their credibility ultimately must stand or fall on the basis of data from scientific research that is subject to adequate control and is potentially replicable. We divided the task into two parts. First, we looked at the best scientific arguments for the reality of psychic phenomena. Our sponsors, as well as our own appraisal of the current status of parapsychology, indicated that the two most influential scientific programs were the experiments on remote viewing and the experiments on psychokinesis using random event generators. In addition, we looked at the research on the Ganzfeld (whole visual field) because this, in the opinion of many parapsychologists, is the most likely candidate for a replicable experiment. We also report on a parapsychological experiment that the committee itself witnessed. Second, we considered the arguments of proponents who rely on what they call qualitative as opposed to quantitative evidence for the paranor- mal. Such evidence depends on personal experience or the testimony of others who have had such experience. Most, if not all, of this evidence cannot be evaluated by scientific standards, yet it has created compelling beliefs among many who have encountered it. Witnessing or having an anomalous experience can be more powerful than large accumulations of quantitative, scientific data as a method of creating and reinforcing beliefs. Because personal experience rather than scientific data has been the source of most beliefs in the paranormal, we have devoted some of our resources to considering this sort of cognitive method as a tool for achieving knowledge. STANDARDS OF EVIDENCE Diverse justifications have been offered for pursuing paranormal claims. One argument asserts that paranormal phenomena may no longer be anomalous, given the implications of contemporary quantum mechanics. Indeed, a few physicists have supported some parapsychologists in maintaining that certain forms of precognition and psychokinesis are consistent with some interpretations of quantum theory. The other major argument is that we have no choice but to get involved because the Soviet Union already has a program to develop military applications of psychic phenomena. Several proponents, including some scientists, firmly believe that paranormal phenomena have been scientifically demonstrated several times over. At the same time, most scientists do not believe that psi exists. Many persons on both sides believe this paradox to be the result of irrational and dogmatic belief systems. The proponents accuse the critics of being closed-minded and bigoted. The critics imply that the
PARANORMAL PHENOMENA 173 proponents have allowed wishful thinking to bias their judgment and that they are incompetent scientists and are self-deceived. Both sides can point to examples to back their positions. One essential question confronts the committee: What does an impartial examination of the scientific evidence reveal about the existence of psi? Such an examination assumes that clear standards exist for judging the adequacy of the evidence, which, in turn, raises the issue of what constitutes sufficient evidence. That issue involves many difficult philo- sophical, theoretical, and methodological matters. For example, Palmer, in his '`An Evaluative Report on the Current Status of Parapsychology" (1985), denies that current parapsychological experiments can provide any evidence for the existence of psi. This is because psi implies paranormality and, according to Palmer, we cannot argue that a given effect has a paranormal cause until we have an adequate theory of paranormality. He further argues, however, that parapsychological ex- periments can and do provide evidence for the existence of anomalies. By an anomaly, Palmer means a statistically significant deviation from chance expectation that cannot readily be explained by existing scientific theories. The burden of Palmer's paper is that just such anomalies have been demonstrated. Because parapsycnologists other than Palmer do not make this distinc- tion between demonstrating an anomaly and testing a theory of paranor- mality, we do not carry on this distinction in our own assessment of the evidence. We tend to agree with Palmer on this matter, however. When we talk about evidence for psi in the remainder of this chapter, we are using psi in the neutral sense of an apparent anomaly rather than in the stronger sense of a paranormal phenomenon. MINIMAL CRITERIA Fortunately, critics and parapsychologists appear to agree on the general requirements necessary to demonstrate psi in a parapsychological experiment. Both Palmer (1985) and James E. Alcock (Appendix B) discuss such criteria in their respective papers. As Palmer points out, psi is defined negatively as a statistical departure from a chance baseline that cannot be accounted for by chance, sensory cues, or known artifacts. Such a negative definition implies the minimal criteria required to justify a conclusion that psi has been demonstrated. Given the statistical aspect, it is imperative that the data be collected in such a way that the underlying probability model and assumptions of the statistical test are fulfilled. This means that targets must be adequately randomized and that each trial in the experiment must be independent of the preceding ones and, of course, the statistical procedures must be
74 ENHANCING HUMAN PERFORMANCE applied and interpreted correctly. Given that all ordinary explanations must be ruled out, the experimenter must take special precautions to ensure that sensory cues, recording errors, subject fraud, and other alternatives have been prevented. Although it is impossible to rule out completely every possible contaminant or to anticipate every alternative, there are reasonable standards that most parapsychologists would agree should be followed. Because different research paradigms have their own special require- ments, no single set of standards can be specified in advance for all parapsychological experiments. Experiments with electronic number generators, for example, rarely have problems with data recording, but they do require special methods such as tests of randomness and attention to the immediate physical environment that are unnecessary with more traditional parapsychological experiments. One requirement for assessing the adequacy of a given experiment is that its procedures and methods of analysis be adequately documented. Unless we know how the targets were selected, how the results were analyzed, how the possibility of sensory leakage was prevented, and how other such aspects of the study were carried out, we have no basis for evaluating the quality of the information provided by the experiment. GLOBAL CRITERIA The criteria mentioned in the preceding paragraphs apply to the individual experiment. More global criteria come into play when one wants to evaluate an entire research program or set of experiments. Here we look for such things as replicability, robustness, lawfulness, manip- ulability, and coherent theory. These criteria deal with the coherence and intelligibility of the alleged phenomena. It is in terms of such global criteria that parapsychological research has been especially vulnerable. Much of the objectivity involved in assessing the adequacy of research applies to judging individual experiments. But science is cumulative and depends not so much on the outcome of a single experiment as on consistent and lawful patterns of results across many experiments carried out in a variety of independent settings. Lawful consistency in this sense, according to both parapsychologists and their critics, has never been found in parapsychological investigations in the history of psychic research. Recently a few parapsychologists have expressed the hope that the experiments on remote viewing, random number generators, and the Ganzfeld (the very ones we have chosen to examine in detail in this report) may actually yield the long-sought replicability. The type of replicability that has been claimed so far is the possibility of obtaining significant departures from the chance baseline In only a proportion of
PARANORMAL PHENOMENA 175 the experiments, which is a kind of replicability quite different from the consistent and lawful patterns of covariation found in other areas of Inquiry. Despite the fact that scientific progress in a given area depends on the accumulation of lawful and consistent patterns across many experiments, the methods for deciding that such consistency exists are still quite primitive in comparison with the standards for judging the adequacy of a single experiment. Indeed, it is only within the past few years that serious attention has been devoted to developing objective and standard- ized procedures for evaluating the consistencies across a body of inde- pendent studies. For the most part, judgment about what a body of investigations demonstrates is still a surprisingly intuitive and haphazard process. This probably has not been a serious drawback in those areas of inquiry in which the basic phenomena are robust and experiments can be conducted with high confidence that the predicted relations will be obtained; but such impressionistic means for aggregating the outcomes of several experiments in the domain of parapsychology open the door to all the motivational and cognitive biases discussed in the paper prepared for the committee by Griffin. Not only are the data and alleged correlations erratic and elusive in this field, but their very existence is open to question. EVALUATION OF THE SCIENTIFIC EVIDENCE To evaluate the best scientific evidence on the existence of psi, and with the advice of proponents and our sponsors, we conducted site visits to some of the most notable parapsychological laboratories. The para- psychology subcommittee (see Appendix C) visited Robert Jahn's Engi- neering Anomalies Research Laboratory at Princeton University, where it witnessed presentations and demonstrations regarding psychokinetic experiments on random number generators. Jahn and his associates also briefed the subcommittee on the current status of their work in remote . . viewing. The subcommittee also visited Helmut Schmidt's laboratory at the Mind Science Foundation, San Antonio, Texas. Schmidt pioneered the use of random number generators in parapsychology experiments in 1969. His is considered one of the two major research programs on psychokinesis (the second is Jahn's). As an additional posssible input, the committee agreed to participate in a psychokinetic experiment of new design with Helmut Schmidt. Specifically, Schmidt accepted the suggestion that the committee's con- sultant, Paul Horwitz, be included in the conduct of the experiment. The
176 ENHANCING HUMAN PERFORMANCE work has not yet begun, however, and it now appears that we will not have any results to report before our terms expire. The chair of the parapsychology subcommittee also visited SRI Inter- national, another major laboratory studying psychic effects on random number generators. (This latter research group argues that the observed effects are not due to psychokinesis but rather represent a special form of precognition.) The subcommittee chair also attended the meetings of the Parapsychological Association held at Sonoma State College in California. The entire committee made a site visit to Cleve Backster's laboratory in San Diego (arranged to coincide with the committee's meeting in La Jolla, California). These site visits enabled the committee to observe firsthand the experimental arrangements and equipment used by some of the major contributors to parapsychological research. They also provided us an opportunity to discuss results, interpretations, and problems with a few important investigators. We were impressed with the sincerity and dedication of these investigators and believe that they are trying to conduct their research in the best scientific tradition. We also got the impression that this type of research involves many unresolved problems and still has a long way to go before it develops standardized, easily replicable procedures. The information obtained from these site visits does not provide an adequate basis for making scientific judgments. For this we rely, as we would in other fields of science, on a careful survey of the literature. RESEARCH ON REMOTE VIEWING The SRI Remote Viewing Program Since the early 1970s, probably the best known research program in parapsychology has been the experiments in remote viewing initiated by physicists Harold Puthoff and Russell Targ when they were at SRI International. In a typical remote viewing experiment a subject, or percipient, remains in a room or laboratory with an experimenter, while a target team visits a randomly selected geographical site (e.g., a shopping mall, an outdoor arena, the Palo Alto airport, the Hoover tower). Neither the experimenter nor the subject has been given any information about the target. Once the experimenter and the subject are closeted in the laboratory, they wait for 30 minutes before the subject begins to describe his or her impressions of the target site. Meanwhile the target team, consisting of two to four members of the SRI staff, obtains instructions for going to a randomly chosen target site from another SRI staff member. They then drive to the
PARANORMAL PHENOMENA _ _ . . . -- r ~r -A 177 designated target site and remain there for an agreed-on 15-minute period (after allowing approximately 30 minutes to reach the site). During the time that the target team remains at the target site, the subject describes his or her impressions into a tape recorder and also makes any drawings that would help to clarify those impressions. When the target team returns to the laboratory, all the participants listen to the tape recording of the subject's impressions. Then all the participants go to the target site, where the subject is allowed to see how closely his or her impressions agreed with the actual target. The first subject to participate in such a formal series of trials was the late Pat Price. In the first series, consisting of nine sessions, the duration of each session was 30 minutes. The transcript for each session is rich in detail; the one published transcript in Targ and Puthoff's first book runs to almost six printed pages (Tar" and Puthoff, 1977~. Given such data, how does one decide if the experiment was a success? Did Price's descriptions, for example, convey correct knowl- edge of the different target sites? In fact, two methods have been used to demonstrate the effectiveness of remote viewing. One method is simply to compare the description with the target and make a judgment as to whether the correspondence is sufficient to claim a "hit." The second method uses an independent judge to rank the degree to which each description matches each site and then applies statistical tests to decide if the association is greater than chance. Unprecedented success was claimed for the early remote viewing experiments in terms of both methods (Tar" and Puthoff, 1974, 1977; Puthoff and Targ, 19761. Many examples were supplied of dramatic correspondences between impressions of the percipient and the physical details of the actual target. Such correspondences, no matter how dramatic and compelling? do not carry scientific weight, because it is impossible to assess their probabilities. In addition, much psycholog- ical research indicates how such subjective validation can create strong. but false, illusions of matching (see below). The more formal evidence from the rankings of independent judges was also impressive. The first formal series of nine trials resulted in ~ .. . , . . . . ~ . . .. . . . . . . . seven ot the transcripts being ranked 1 against their intended larger sites by the independent judge. Only one such ranking would be expected by chance. Puthoff and Targ reported the probability of such an outcome being due to chance as only 0.0000029. The second formal series, using Hella Hammid, was equally impressive, producing five first places and four second places in the rankings of transcripts against target sites. Although subsequent series by Targ and Puthoff, as well as by
178 ENHANCING HUMAN PERFORMANCE other investigators, have not always yielded such overwhelmingly impressive results, most of them have continued to display highly significant outcomes (Tar" and Harary, 1984~. On the surface, at least, this is a reliable, simple, and highly effective recipe for producing paranormal communication. Especially appealing is the claim that remote viewing works with just about everyone. Targ and Harary, for example, provide exercises for anyone who wants to develop and improve his or her ability to pick up information at remote sites. Neither space nor time, its proponents assert, is a barrier. The percipient can pick up information from the surface of Jupiter as well as from target sites that can be visited at some future time. Scientific Assessment of Remote Viewing After the first remote viewing experiments were conducted in the early 1970s, many investigators throughout the world tried to follow suit. Most of them believed that their findings supported the claims of the SRI International researchers. The majority of these experi- ments, however, consisted of informal demonstrations rather than formal scientific experiments and relied solely on subjective matching. In the past 15 years, the number of formal experimental replications of the SRI remote viewing experiments has been surprisingly few. Targ and Harary (1984) include as an appendix in their book a report by Hansen, Schlitz, and Tart that evaluates all the known remote viewing experiments conducted from 1973 through 1982. "In an examination of the twenty-eight formal published reports of attempted replications of remote viewing," write Targ and Harary, "Hansen, Schlitz, and Tart at the Institute for Parapsychology found that more than half of the papers reported successful out- comes." They concluded: "We have found that more than half (fifteen out of twenty-eight) of the published formal experiments have been successful, where only one in twenty would be expected by chance." Two comments may be in order with respect to the foregoing conclusion. First, given the enormous publicity and the unusually strong claims, 28 formal experiments in 10 years seems surprisingly few. In comparison, the Ganzfeld psi experiments produced approxi- mately twice as many formal experiments during the same interval. Second, 13 of the 28 formal experiments, or 46 percent, failed to claim successful outcomes. This rate of failure is much higher than what might have been expected on the basis of the earlier claims by Targ and Puthoff (1977), namely, that they had succeeded with every subject they had tried. .
PARANORMAL PHENOMENA Even 15 successful outcomes out of 28 tries 179 is impressive, especially of the listed studies by parapsychological standards. An inspection however, suggests that the 28 formal experiments vary considerably in their importance. Some of these "published formal experiments" appeared as brief reports or abstracts of papers delivered at meetings of the Parapsychological Association or similar organizations. Others appeared in print only as brief or informal reports in book chapters or letters to the editor. Altogether, 15 of the 28 were published under conditions that fall short of scientific acceptability. Only 13, or 46 percent, of the experiments were published under refereed auspices. As in other sciences, only published reports that have undergone peer review and are adequately documented can be con- sidered seriously as part of the scientific data base. Of the 13 scientifically reported experiments, 9 are classified as successful in their outcomes by Hansen et al. (Tar" and Harary, 1984~. Seven of these nine experiments were conducted by Targ and Puthoff at SRI International, the remaining two at other labora- tories. This relatively small harvest of nine "successful" experiments suffers from the fact that each is seriously flawed. A variety of problems afflicts the published reports on remote viewing. The documentation, even according to many parapsychologists, is seriously inadequate. Attempts by both neutral and skeptical investigators to gain access to the raw data have typically been thwarted or strongly resisted. Because the essence of scientific justification is public accessibility to the data, this relative inaccessibility suggests that much of the remote viewing data base is not part of science. Most of the reasons for questioning the acceptability of the evi- dence for remote viewing lie in a methodological flaw that char- acterizes all but one of the experiments deemed successful: the successive trials are not independent of one another. This lack of independence has unfortunate consequences for any attempt to draw conclusions about ESP based on the outcomes of such experiments. The concept of independence is technical and somewhat difficult to explain simply, but. since it is critical to understanding why the remote viewing experiments fail to make their case, we supply an Intuitive explanation. Assume that we are considering a remote viewing experiment in which the subject participates in only two trials. In other words, we deal with two randomly chosen target sites. For the first trial, the target team goes to the first target site and remains there while the subject produces his or her first description. Immediately after this trial, the target team returns to the laboratory and takes the subject to the actual target site so that he or she and the others can gain a ,, ,% . . . ~. . . ~, .
180 subjective impression of how closely the description corresponds with the target. For the second trial, the target team visits a second randomly chosen site. While they are visiting this site, the subject produces a second description. When the experiment is over, the list of target sites (in random order) and the transcripts of the subject's descriptions are given to a judge, who also visits each site. While at a given site, the judge reads the two transcripts and ranks them in terms of how well each one corresponds with the particular site. In our example, one of the transcripts will be ranked 1 and the other will be ranked 2 (with 1 indicating the better correspondence between that target and the transcript). After visiting one site and doing this ranking, the judge then visits the second site and repeats the ranking procedure. The raw data can be set out in a matrix with the target sites as the columns and the transcripts as the rows. A perfect outcome would be indicated if the transcript produced at the time the team was visiting site A was ranked 1 against that site, and the transcript produced when the team was visiting site B was ranked 1 for that site. (Of course, two trials would be too few to make an adequate statistical assessment of the success of the matching successful matching would occur too frequently just by chance. The principles we want to illustrate, however, remain the same for two as for many trials.) If the successive trials in the experiment were independent of one another, and we were interested only in direct hits (that is, outcomes for which the intended transcript was rated 1 against the target site), then we could expect the subject to make between zero and two direct hits. Indeed, if chance alone were operating, there would be four, equally likely, possibilities: (1) no hits, (2) a hit on the first trial and a miss on the second, (3) a miss on the first trial and a hit on the second, and (4) two hits. By this reckoning, the subject could be expected to get two direct hits just by chance in one of every four experiments. But, as we indicated, the successive trials are not independent. This is because the judge is almost certainly not going to rank a transcript as 1 for more than one target site. This means, in our example, that if he or she ranks the first transcript 1 for target A, then he or she will probably rank the second transcript 1 for target B. In effect, this lack of independence between trials means that, instead of four equally likely possible outcomes there are only two: no hits or two hits. The dependence between trials has created a situation in which the chance probability of two hits is now 50 percent rather than 25 percent. ENHANCING HUMAN PERFORMANCE
PARANORMAL PHENOMENA 181 In this situation, if an experimenter uses a statistical test that assumes independence, he or she will come out with the wrong probabilities. In fact, the statistical test will exaggerate the signif- icance of many outcomes. The failure of the experimenters to realize this problem resulted in exaggerated levels of significance for the early remote viewing experiments. Kennedy (1979), who originally pointed to this problem, recalculated the probabilities for some of these experiments. Puthoff and Targ (1976) reported that five of their first six remote viewing experiments were significant at the .05 level. With Kennedy's corrections for lack of independence, only two remained significant. According to Kennedy, only one of the two successful replications by Bisaha and Dunne (1979) remained significant with the more appropriate test. One reason for the optimistic initial beliefs in the scientific reality of remote viewing was the fact that the lack of independence between trials produced exaggerated odds against chance results. But even with conservative corrections for lack of independence, approxi- mately one-third of the early experiments still yielded successful outcomes. One easy way to avoid this problem of dependence is to use a separate target pool of possible sites for each trial. For example, for the first trial one could designate a pool of four possible sites, one of which is randomly chosen to be the actual target site. A second pool of four different possible sites would be used for the second trial. When the trials are completed, the judge is given the list of the four sites for the first trial along with the subject's description for that trial. The judge then ranks each site in terms of its correspondence to the description. The four possible sites for the second trial are then ranked in terms of their correspondence to the subject's description for the second trial. In this illustration, the subject has a probability of 1 in 4 of having the actual target site ranked 1 on each trial, or a probability of 1 in 16 of being correct on both trials. This second procedure, which is typically used in most free-response parapsychological experiments (such as the Ganzfeld experiments discussed below), not only guarantees independence between succes- sive trials, but also avoids other serious problems, which we discuss next. The fact that the subject is given feedback by being taken to the target site immediately after each trial creates an additional form of dependence between trials. For this reason, other possibilities exist for obtaining " successful" results artifactually. The tran- scripts can contain clues that provide nonparanormal reasons for judges to associate descriptions with targets correctly. Some of these
182 ENHANCING HUMAN PERFORMANCE clues can be quite overt, such as when a subject mentions in the description how the current target apparently differs from a previous target site. When such a clue appears in the description, it provides the judge with information that the current description does not belong with the previous site. This increases the probability that the description will be matched with its appropriate target. Marks and Kammann (1978) initiated a controversy, still not fully resolved, by claiming that such overt clues were sufficient to account for the striking results of the very first SRI remote viewing with Pat Price. Targ and Puthoff did not deny the existence of such clues in the Price series but argued that they were not sufficient to have accounted for the results. This dispute still has not been settled (Tart, Puthoff, and Targ, 1980; Scott, 1982; Marks and Scott, 1986~. Possibly this controversy over the role of the more overt clues has deflected attention from a much more fundamental and fatally damaging criticism first made by Hyman (1979) and independently by Kennedy (19791. Hyman and Kennedy pointed out that the combination of immediate feedback and lack of independence between successive trials makes it virtually impossible to prevent sensory cueing in the transcripts. As long as both the subject and the experimenter who is closeted with the subject are not blind to the preceding target sites, there is no way to prevent the transcript from being affected in a variety of possible and perhaps subtle ways by the knowledge of the preceding targets. Hyman (1984-1985) provides an illustration of how such implicit sensory cueing might occur (pp. 131-132~: Say that the target for the first session was the Hoover Tower at Stanford. This will almost certainly influence what both the viewer and the interviewer say during the second and subsequent sessions In the same series. Almost certainly the viewer, during the second session, will not supply an exact description of the Hoover Tower. So, whatever the viewer says during the second session, a judge should find it to be a closer match to the second target site than to the first one. Now, assume that the second target site happened to be the Palo Alto train station. The v~ewer's descriptions during the third session will avoid describing either the Hoover Tower or the Palo Alto train station. We do not need to hypothesize something as mysterious as psi to predict that a judge should find this third description a better match to the third target site than either of the first two. As we add sessions, this effect of immediate feedback should continue to make the correlation between the viewer's descriptions and the target sites better and better. No amount of editing for overt clues can overcome this defect of remote viewing experiments that follow the SRI pattern of dependent trials and immediate feedback. The mechanism described by Hyman
PARANORMAL PHENOMENA 183 should result in some dramatic correspondences. These dramatic corre- spondences, in conjunction with subjective validation, are a highly potent recipe for creating the illusion (for both experimenters and subjects) that ESP has occurred. Palmer ( 1985), a major parapsychologist who otherwise carefully considers the criticisms of parapsychology, misses the seriousness of this flaw. In mentioning Hyman's criticism, he writes (p. 501: It has been suggested by Hyman (1979) that since the subjects in most cases received feedback of the correct target after each trial, the subject could have gained some advantage by avoiding to mention characteristics of targets in earlier trials in their responses in later trials. As noted by Targ, Puthoff, and May (1979), the target pool for the geographical-site experiments was sufficiently large and contained sufficient redundancy that this is unlikely to be a significant biasing factor. Perhaps such complacency has enabled experimenters to continue con- ducting remote viewing experiments with this fatal flaw. In fact, the size of the target pool, no matter how large, does not affect the validity of Hyman and Kennedy's criticism. Nor does the claim that the pool contained sufficient redundancy make much difference. Each geographical site is unique and contains a combination of specific characteristics that distinguishes it from the other sites in a given series. Indeed, as the parapsychologists themselves have asserted, unless this were so, there would be no possibility of the transcripts' being uniquely associated with a given target site. In every one of the remote viewing experiments that allows the possibility of subtle cueing, the possibility of the judges' being able to make completely successful matchings because of this artifact is highly plausible; and as long as a highly plausible, normal alternative to ESP can account for the apparent success of the outcomes the parapsy- chologists, by their own standards, cannot claim evidence for paranormal transmission of information. As it turns out, all but one of the nine scientifically reported studies of remote viewing (at the time of the Targ and Harary survey) suffer from the flaw of sensory cueing. The one experiment that cannot be faulted for this reason is the long-distance remote viewing experiment of Schlitz and Gruber (19804. However, as Hyman (1984-1985) has pointed out, this experiment suffers from another very serious flaw. Gruber, who was a member of the target team and thus was familiar with the targets, translated the subject's target descriptions into Italian for the judging process. Why the experimenters allowed such potential sources of biased experimental procedures is not known, but the violation obviously negates the results as evidence for psi. Since the Targ and Harary survey, we have learned of two attempts
84 ENHANCING HUMAN PERFORMANCE to replicate the Schlitz and Gruber experiment without the flaw mentioned. One, still unpublished, produced negative results. The second, by Schlitz and Haight (1984), produced marginally significant results. Indeed, if the more acceptable two-tailed test of significance had been used, the results would not have been considered significant by customary standards. Although the report of this study lacks sufficient documentation with respect to certain aspects of procedure, both Palmer (1985) and Alcock agree that this is the best controlled and most methodologically sound of all the remote viewing experiments so far. In summary, after approximately 15 years of claims and sometimes bitter controversy, the literature on remote viewing has managed to produce only one possibly successful experiment that is not seriously flawed in its methodology and that one experiment provides only marginal evidence for the existence of ESP. By both scientific and parapsychological standards, then, the case for remote viewing is not just very weak, but virtually nonexistent. It seems that the preeminent position that remote viewing occupies in the minds of many proponents results from the highly exaggerated claims made for the early experiments, as well as the subjectively compelling, but illusory, correspondences that experimenters and participants find between components of the descrip- tions and the target sites. RESEARCH ON RANDOM NUMBER GENERATORS The Basic Paradigm The use of random number (or random event) generators for parapsychological research began in the 1960s and became relatively standard during the 1970s as the technology became widely available. A random number generator (RNG) is simply an electronic device that uses either radioactive decay or electronic noise to generate a sequence of random symbols. Originally such devices were used to test ESP, usually clairvoyance or precognition, but the most wide- spread and widely known work focuses on what is called micropsy- chokinesis, or micro-PK. In such research a subject, or operator, attempts to mentally bias the output of the random number generator, so that it produces a nonrandom sequence. Most of the work with RNGs has used binary generators, or what Schmidt calls "electronic coin flippers." The output on each trial is either 0 or 1, that is, heads or tails. If the RNG is unbiased and truly random, then it should produce, on control runs, sequences of 0s and Is that are independent of each other and that, in the long run, will yield Is 50 percent of the time.
PARANORMAL PHENOMENA 185 In a typical experiment, a subject (either a person who claims to be a psychic or a person chosen for availability who does not make such claims) is placed in the vicinity of the RNG and attempts to bias the output either toward more or fewer Is. When an animal is used as the subject, the RNG output is usually coupled to an outcome whose frequency the animal presumably would like to either increase or decrease. In an experiment carried out with cockroaches, for example, one outcome was electric shock. If, during the time the output of the RNG was coupled with the shock apparatus, the proportion of shocks decreased below 50 percent, this would be taken as evidence of a psychokinetic effect of the cockroach on the output of the RNG. The RNG experiments have been of interest to some military and governmental personnel because of the possibility, if such micro-PK is demonstrable, of psychically affecting equipment and computers that depend on the output of electronic symbols. Results of the Experiments In a recent survey 56 reports published between 1969 and 1984 and dealing with research on possible psychokinetic perturbations of binary RNGs (Radin, May, and Thomson, 1985), the reviewers counted 332 separate experiments. Of the 332 experiments, 188 were reported in refereed journals or conference proceedings, and of these 188 experiments with some claim to scientific status, 58 reported statis- tically significant results (compared with the 9 or 10 experiments that would be expected by chance). The other 144 experiments were produced by the Engineering Anomalies Research Laboratory at Princeton University; none of them had been published in a refereed journal at the time of the survey. Of these 144 experiments, 13 were classified as yielding statistically significant results. So, in the total sample of 332 experiments, 71 yielded ostensibly significant results at the traditional .05 level. This amounts to a success rate of approximately 21 percent, compared with the rate of 5 percent that would be expected by chance. Palmer (1985) and Alcock agree that such results cannot be accounted for by chance. In other words, both the parapsychologist and the skeptic, in their respective reviews of the RNG research, agree that something other than accidental fluctuation is producing these results. Palmer calls this something an anomaly, which, while it may or may not be paranormal, cannot be explained by current scientific theories. Alcock points to various defects in the experimen- tal protocols and concludes that no conclusions about the origins of these departures from randomness are justified until successful
86 ENHANCING HUMAN PERFORMANCE outcomes can be more or less consistently produced with adequately designed and executed experiments. Both Palmer and Alcock focus their reviews on the two most influential research programs on RNGs. One is the program of Helmut Schmidt, a quantum physicist who began working on psi and RNGs in 1969. The other is the program begun by Robert Jahn in the late 1970s, when he was dean of the School of Engineering and Applied Science at Princeton University (see Jahn, 19821. These two programs have accounted for almost 60 percent of all known experiments on RNGs. They have also been the most consistently successful in achieving statistically significant outcomes. Although the results suggest that on each experimental group of trials the number of is is greater or less than the 50 percent baseline (depending on the intended direction?, the actual decree of deviation Tom chance Is quite small. AS Palmer (1985) indicates, Schmidt's subjects have averaged approximately 50.5 percent hits over the years, compared with the expected baseline of 50 percent. This amounts to producing one extra 1 every 100 trials. The reason such a small departure from chance is statistically significant is that an enormous number of trials is conducted with each subject. Jahn and his colleagues at Princeton have, in a much shorter time, produced on the order of 200 times the number of trials that Schmidt did in 17 years. The Princeton researchers have also produced a significantly lower success rate than Schmidt. In their formal series of 78 million trials, the percentage of hits in the intended direction was only 50.02 percent, or an average of 2 extra hits every 2,500 trials. Again, such an extremely weak effect is statistically signifi- cant only when one is dealing with very large numbers of trials. Scientific Assessment of the RNG Experiments Palmer (1985) carefully reviews the major criticisms of the work of Schmidt and Jahn. He addresses questions about security, because subjects often are left alone with the apparatus during the data collection. In the Princeton experiments, the data are always col- lected when the subject is alone with the apparatus. Although the Princeton experiments now contain a number of features that would make it extremely difficult for a naive subject to bias the results, it is not clear that this has always been so. It would make good scientific sense to conduct some trials during which the subject is carefully monitored to see if successful outcomes are still obtained. The major reservations about the RNG experiments concern the adequacy of the randomization of the outputs. Schmidt applied only limited tests for the randomness of his machines, and most of the
PARANORMAL PHENOMENA 187 control trials were gathered by allowing the machine to run for long periods, usually overnight. Although these controls usually produced results in line with the chance baseline, critics have pointed out that the controls are unsatisfactory because they were not conducted for shorter runs and at the same time as the data from the experimental sessions. !~ Palmer grants that the critics are correct in pointing out some of the shortcomings in Schmidt's methods for testing and controlling for the randomization of his machines. Palmer also correctly points out that such criticism is somewhat blunted by the fact that the critics have not specified any plausible mechanisms that would account for the obtained differences between the experimental and control trials. He is correct in pointing out that the Princeton experiments provide more adequate controls; however, he has probably assumed that the baseline controls in the Princeton experiments were run at the same time as the two experimental conditions of hitting and missing. It is easy to interpret the somewhat ambiguous description of the procedure in this manner. The relevant part of the authors' methodological description is as follows (Nelson, Dunne, and Jahn, 1984:91: The primary variable in these experiments is the operator's pre-recorded intention to shift the trial counts to higher or lower numbers. This direc- tional intention may be the operator's choice the so-called "volitional'' mode or it may be assigned by a specified random process the "instructed" mode. In either mode, data are collected in a "tri-polar" protocol, wherein teals taken under an intention to achieve high numbers (PK+), teals taken under an Intention to achieve low numbers (PK- ), and trials taken as baseline, i.e. under null intention (BL), are interspersed in some reasonable fashion, with all other operating conditions held identical. For all three streams of data, effect size is measured relative to the theoretical chance mean. This tri-polar protocol is the ultimate safeguard in precluding any artifacts such as residual electronic biases or transient environmental influences from systematically distorting the data. At first glance it might appear as if the bipolar protocol requires that the two types of experimental groups of trials and the baseline group of trials always be taken at the same session. This would be consistent with the claim that '`anv artifacts such as residual , electronic biases or transient environmental influences" were thereby precluded "from systematically distorting the data." Such a claim would be justified if, in fact, at each session one group of trials of each of the three types was obtained, provided that each group of trials was of the same length and that the order of the three types of trials was independently randomized for each session. The description provided by Nelson and his colleagues says nothing
88 ENHANCING HUMAN PERFORMANCE at all about the order in which the three conditions were conducted, and a careful reading indicates that the baseline data may not always have been obtained at the same sessions and under the same conditions as the experimental groups of trials. It is not clear what the authors mean by stating that the three trials "are interspersed in some reasonable fashion." In fact, an examination of the data reported for each subject makes it clear that the strict bipolar protocol could not possibly have been followed with much of the data collection, because in many cases the baseline data are entirely absent or occur with many fewer trials than the experimental data. Indeed, it is not even clear that PK+ and PK~ trials were always obtained at the same sessions, because for some subjects the total numbers of these trials are not equal. We suspect that, over the six years or so during which the Princeton group was accumulating its data base, it made many changes in both the hardware and the experimental protocol. The sophisticated procedures currently in use and the requirement that the three types of trials be of equal length and that one of each be conducted at each session are the most recent variations in the paradigm. Unfor- tunately, the data are not presented in such a way that it is possible to determine whether the successful results are due to the earlier or the later experiments. Such issues become especially important when we consider the extremely small size of the effect being claimed and when we further realize, as Palmer has pointed out, that the bulk of the significance in the formal series was due to just one subject, who contributed 23 percent of the total data. This one subject achieved a hit rate of 50.05 percent. When her data are eliminated, the remaining data yield a hit rate of 50.01 percent, which is no longer significantly different from chance. In other words, it looks as if almost all the success of Jahn's huge data base can be attributed to the results from one individual, who, over the years, produced almost 25 percent of the data. This one individual was not only the most experienced subject, but also, presumably, familiar with the equipment. When combined with the fact, as Palmer points out, that the Princeton experiments provide inadequate documentation on precautions to prevent tampering by subjects, it becomes even more important to see if the same degree of success can be achieved when the sessions are adequately monitored. Alcock, in his review of the same RNG studies surveyed by Palmer, points to a number of weaknesses in both the Schmidt and the Princeton experiments. For example, he faults Schmidt's experiments for such things as inadequate controls, failure to examine the target se
PARANORMAL PHENOMENA 189 quences, overcomplicated experimental setups, inadequate tests of randomness, and lack of methodological rigor. Alcock faults the Princeton experiments for such things as failing to randomize the sequence of groups of trials at each session, inadequate documentation on precautions against data tampering, and possibilities of data selection. Palmer and Alcock do not really differ in their assessments of the shortcomings of the Schmidt and Princeton RNG experiments. They do differ, however, on what conclusions can be drawn from such imperfect experiments. Palmer emphasizes the fact that the critics have not provided plausible explanations as to how the admitted flaws could have caused the observed results. His position seems to be that, unless the critics can provide such plausible alternatives, the results should be accepted as demonstrating an anomaly. Alcock focuses on the fact that the successful results have been obtained under conditions that fall short of the experimental ideals that parapsychologists themselves profess. He emphasizes that the para- psychologists have no right to claim to have demonstrated psi from experiments that have been conducted with '~dirty test tubes." Such a revolutionary conclusion as the existence of psi demands justifi- cation from experiments that have clearly used "clean test tubes." What would it take to conduct an adequate RNG experiment? May, Humphrey, and Hubbard (1980) set out to do just that. After reviewing all available RNG experiments from 1970 through 1979 and taking into account the various deficiencies in these experiments, they gathered together and meticulously tested the components necessary to provide adequately randomized trials. They also devised a careful experimental protocol and set out in advance the precise criteria that would have to be fulfilled before they could call their results successful. Going further, after they completed the experi- ment with results that met their criteria for success, they subjected their equipment to all sorts of physical extremes to see if they could obtain such a degree of success by a possible artifact. They report that this singularly well controlled RNG experiment in fact met their criteria for success. It is unfortunate, therefore, that this carefully thought-out experiment was conducted only once. After the one successful series, using seven subjects, the equipment was dismantled, and the authors have no intention of trying to replicate it (personal communication, August 19861. It is unfortunate because this appears to be the only near-flawless RNG experiment known to us, and the results were just barely significant. Only two of the seven subjects produced significant results, and the test of overall significance for the total formal series yielded a probability of 0.029.
190 ENHANCING HUMAN PERFORMANCE The experiment, while nearly flawless, still had some problems as evidence for psi. For one thing, it was reported only in a technical report in 1980 and has never been published in a refereed scientific journal. Despite the admirable attention to details, all the control trials were taken when no human being was present. One might argue that this was not an ideal control for the experimental session, in which a subject was physically present in the room. The authors have assured us that their various attempts to bias the machine by physical means almost certainly rule out the possibility that the mere presence of a human being could have affected the output. However, a physicist who claims to have several years of experience in constructing and testing random number devices tells us that it is quite possible, under some circumstances, for the human body to act as an antenna and, as a result, possibly bias the output. May and his colleagues at SRI, in the same technical report in which they claim successful results for their single experiment, surveyed all the RNG experiments known to them through the year 1979 and found that their combined significance was astronomically high. They add (May, Humphrey, and Hubbard, 1980:81: This impressive statistic must, however, be evaluated with respect to experimental equipment and protocols. All the studies surveyed could be considered incomplete in at least one of the following four areas: (1) No control tests were reported in more than 44 percent of the references. Of those that did, most did not check for temporal stability of the random sources during the course of the experiment. (2) There were insufficient details about the physics and constructed parameters of the experimental apparatus to assess the possibility of environmental influences. (3) The raw data was not saved for later and independent analysis in virtually any of the experiments. (4) None of the experiments reported controlled and limited access to the experimental apparatus. As far as we can tell, the same four points can be made with respect to the RNG experiments that have been conducted since 1980. The situation for the RNG experiments thus seems to be the same as that for remote viewing: over a period of approximately 15 years of research, only one successful experiment can be found that appears to meet most of the minimal criteria of scientific acceptability, and that one successful experiment yielded results that are just marginally significant. RESEARCH ON THE GANZFELD The GanJeld Experiments The Ganzfeld psi experiments are named after the term used by Gestalt psychologists to designate the entire visual field. For
PARANORMAL PHENOMENA 191 theoretical purposes, the Gestalt psychologists wanted to create a situation in which the subject or observer could view a homogeneous Psychologists a Ganzfeld visual field, one with no imperfections or boundaries. later discovered that when individuals are put into situation they tend quickly to experience what they described as an altered state of mind. In the early 1970s, some parapsychologists decided that the use of the Ganzfeld would provide a relatively safe and easy way to create an altered state in their experimental subjects. They believed that such a state was more conducive to picking up the elusive psi signals. In a typical psi Ganzfeld experiment, the subject, or percipient, has halved ping-pony balls taped over the eyes. The subject then reclines in a comfortable chair while white noise plays through earphones attached to his or her head. A bright light shines in front of the subject's face. When seen through the translucent ping-pony balls, the light is experienced as a homogeneous, foglike field. When so prepared, almost all subjects report experiencing a pleasant, altered state within 15 minutes. While one experimenter is preparing the subject for the Ganzfeld state, a second experimenter randomly selects a target pool from a large set. The target pool typically consists of four possible targets, usually reproductions of paintings or pictures of travel scenes. One of the four is chosen at random to be the target for that trial. The target is given to an agent, or sender, who tries to communicate its substance psychically to the subject in the Ganzfeld state. After a designated period, the subject is removed from the Ganzfeld state and presented with the four candidates from the target pool. The subject then ranks the four candidates in terms of how well each matched the experience of the Ganzfeld period. If the actual target is ranked first, the trial is designated a hit. An actual experiment consists of several trials. In the example, the probability is that one of every four trials will produce a hit. If the number of hits significantly exceeds the expected 25 percent, then the result is considered to be evidence for the existence of psi. Critique of the Ganzfeld Experiments In a careful and systematic review of the Ganzfeld experiments undertaken in 1981 and published in the March 1985 issue of the Journal of Parapsychology, Hyman concluded that the data base exhibited flaws involving multiple testing, inadequate controls for sensory leakage, inadequate randomization, statistical errors, and inadequate documentation. These flaws, in his opinion, were sufficient
92 ENHANCING HUMAN PERFORMANCE to disqualify the Ganzfeld data base as evidence for psi. Of the 42 experiments, 39 (93 percent) used multiple analyses, which artificially inflated the chances of obtaining significant outcomes. Only 11 (26 percent) clearly indicated that they had adequately randomized the target selections. As many as 15 (36 percent) used inferior randomi- zation, such as hand shuffling, or no randomization at all. The remaining 16 experiments did not supply sufficient information on how they had chosen the targets. As many as 23 of the experiments (55 percent) used only one target pool, which means that the subject was handed for judging not a copy of the target but the very same target that the percipient had handled, permitting the possibility of sensory cueing. Although the argument for psi is mainly a statistical one, the reports of 12 experiments (29 percents revealed statistical errors. A number of other departures from optimal practice were also found. The same issue of the Journal of Parapsychology contained a lengthy rebuttal by parapsychologist Charles Honorton, one of the pioneers of the Ganzfeld psi technique. Honorton disputed many of Hyman's opinions as to what constituted flaws; provided a reanalysis of the data base to overcome many of the statistical weaknesses of the original experiments; and argued that the flaws he agreed existed were not sufficient to have accounted for the findings. In this respect his analysis is consistent with Palmer's approach. He does not deny that the experiments depart from optimal design, but he argues that such departures are insufficient to account for the results. Honorton and Hyman had the opportunity to discuss their differ- ences about psi in general at the Parapsychological Association meetings in 1986; as aresult, they agreed to draft a joint communique to emphasize those points on which they agree. That communique appeared in the December issue of the Journal of Parapsychology (Hymen and Honorton, 19861. They agree that the current data base is insufficient to support either the conclusion that psi exists or the conclusion that the results are due to artifacts. They further agree that the issue can be settled only by future experiments conducted according to the stated standards of parapsychology, which are also the accepted standards of psychological research. Another important input to the committees judgment on the Ganzfeld research was the systematic evaluation of the contemporary parapsychological literature by Charles Akers (1984), a former parapsychologist. Akers's critique used a methodological strategy different from that used by Hyman. Hyman undertook to evaluate the entire data base of a single research paradigm (Ganzfeld), including both successful and unsuccessful outcomes. Akers surveyed
PARANORMAL PHENOMENA 193 contemporary ESP experiments broadly, but confined his evaluation to those that had produced significant results with unselected subjects. Hyman assigned flaws to experiments without regard to whether each flaw, by itself, could have caused the observed outcome. Akers charged a flaw to a study only if he thought the flaw could have been sufficient to produce the observed result. He chose a sample of 54 parapsychological experiments from areas of research that had been previously reviewed by Honorton or Palmer; his intent was to choose experiments that could be viewed as the best current evidence for the existence of psi. As a result of this exercise, he concluded (Akers, 1984:16~1611: Results from the 54-experiment survey have demonstrated that there are many alternative explanations for ESP phenomena; the choice is not simply ~ ~C!; marl PYr~f~rimPnt`~ fl.~ll(1 troll ~1 ~- BAA_ ^ ^~ .... The numbers of experiments ... flawed on venous grounds were as follows: randomization failures (13), sensory leakage (22), subject cheating (12), recording errors (10), classification or scoring errors (9), statistical errors (12), reporting failures (10~.... All told, 85~o of the experiments were considered flawed (46/54~. This leaves eight experiments where no flaws were assigned.... Although none of these experiments has a glaring weakness, this does not mean that they are especially strong in either their methods or their results.... In conclusion, eight experiments were conducted with reasonable care, but none of these could be considered as methodologically ideal. When all 54 experiments are considered, it can be stated that the research methods are too weak to establish the existence of a paranormal phenomenon. RESEARCH ON ELECTRICAL ACTIVITY AND EMOTIONAL STATES The Backster Laboratory In addition to examining parapsychological research in areas that have produced large literatures, the committee witnessed an example of experimental work at a far less developed stage. On February 10, 1986, committee members visited the Backster Research Foundation in San Diego and saw a demonstration of experimental procedures for detecting a correlation between the electrical activity of oral leukocytes and the emotional states of the donor. Cleve Backster is a polygraph specialist who had at one time helped develop interrogation techniques for the Central Intelligence Agency and now runs his own polygraph school in San Diego. The school is housed in the same rooms that constitute the Backster Research Foundation, which is devoted to the study of what Backster refers to as primary perception. Backster's research on paranormal matters
94 ENHANCING HUMAN PERFORMANCE began in February 1966, when he recorded, from a philodendron plant that he had hooked up to a polygraph, a response he recognized as similar to that of human beings in emotional states. Backster believed he had demonstrated that the plant showed such emotional response when brine shrimp or other living organisms were either threatened or actually killed in an adjoining room. The notion of primary perception in plants became both a popular subject for research and a highly controversial concept during the late 1960s and early 1970s. We were told that Backster has quietly continued his researches into this and related matters. He has now devised a technique for recording electrical activity in leukocytes taken from a donor's mouth. The advantage of this technique, we were told, is that the leukocytes respond mostly to emotional states of the donor. One committee member volunteered to be the demonstration subject. Another member accompanied him to observe the techniques for obtaining the leukocytes and preparing them for recording. The sample was obtained by having the subject "chew" on a 1.2 percent saline solution and then spit it back into a centrifuge tube. Ten such samples were obtained in this way. The samples were then spun in a centrifuge for six minutes, and the particulate matter at the bottom of each tube was pipetted into the preparation tube. The preparation tube contained about one centimeter of particulate matter and was filled almost to the top with 1.2 percent saline solution. Two uninsulated wire electrodes were inserted into the bottom of the tube, which was then placed within a shielded cage and connected by leads to an EEG-type recording apparatus. During the demonstration, the subject sat approximately two meters from the preparation. We were told that subjects usually sit about five meters from the preparation. A split-screen projection video display was provided: the lower portion of the screen recorded the movements of the polygraph paper and pen as they produced a record of the electrical activity presumably taking place in the leukocyte preparation. The upper portion of the screen recorded the behavior of the seated subject. In his previous research using this arrangement, Backster reported that, when the subject revealed an emotional reaction, the electrical action of the leukocytes showed a corresponding reaction. During our demonstration, the polygraph record produced several strong deflections in both the control and the experimental series, but they did not obviously correlate with any corresponding thoughts or emotional states of the subject as various stimuli were presented. Backster suggested that this was probably because so many people were crowded into the laboratory that the leukocytes were respond
PARANORMAL PHENOMENA 195 ing to thoughts and feelings of other individuals in the room. Thus, a demonstration of results, as opposed to techniques, was not, after all, going to be possible during our visit. Backster then showed us videotapes of the split-screen results he had obtained in his "formal" experiments. The results consisted of 12 examples of apparent correlations between an emotional response and a deflection of the polygraph record. The 12 examples came from 7 sessions with 7 different subjects. Although the information is not given in his written report, it appears that each session lasted for approximately half an hour. During this time, the donor is engaged in conversation or watches videotapes of television programs. The sessions are not standardized or planned. Backster's intent, appar- ently, is to elicit spontaneous emotional responses from a subject during the session. He believes that a stimulus that evokes an emotional response in one subject will not necessarily do so in another subject. In one example, the subject was a young man who was looking at an issue of Playboy magazine. The polygraph tracing began to display large deflections soon after he encountered a nude photograph of an attractive young woman. The large deflections continued for approximately two minutes; the tracing slowly settled down to normal activity after the magazine was closed. Soon after, the young man reached for the closed magazine, and the record reveals a single deflection at that point. In another example, the subject was a retired police lieutenant. When discussing his approaching retirement, he was asked a question about his wife's attitude toward having him "underfoot." A large deflection of the polygraph tracing occurred soon after this question was asked. When asked, the donor confirmed that he was emotionally aroused at that moment in the session (see Backster and White, 1985~. Cleve Backster and his supporters apparently believe that he has successfully demonstrated that detached oral leukocytes respond to the emotions of their donor even when separated by as much as several miles. They also believe that these results are reliable and replicable. Critique of the Backster Experiment What we have read and observed about Backster's procedures does not justify the claim he is making. His answers to our questions made it clear that he has not considered using the appropriate controls needed to ensure that the obtained "correlations" are real and due to the causes he has assumed. To make adequate physiological recordings from a
196 ENlIANCING HUMAN PERFORMANCE preparation of in vitro leukocytes and to demonstrate the correlation between emotional response and leukocyte activity requires experimental arrangements and procedures at a level of sophistication well beyond those we observed. Committee members who are knowledgeable about the procedures and instrumentation of psychophysiological experiments expressed doubts about the adequacy of the setup to perform the tasks Backster has undertaken. Serious doubts were expressed about the possibility that the leukocytes were alive at the time of recording. Further doubts were expressed about the setup's ability to avoid contamination of the recording procedures by stray influences of various sorts. We do not discuss these drawbacks in detail here. We confine our discussion to Backster's method for establishing a correlation between the alleged activity of the detached leukocytes and the emotional state of the donor. When we consider how the existence of such correlations was established, we again see how inappropriate methodology can lead to very misleading conclusions. Many problems exist with regard to Backster's procedures for detecting correlations. In trying to demonstrate a pattern of covariation between two records of behavior over time, one record is the tracing of amplified electrical activity coming from the electrodes and through the leads. Although this tracing can be quantified, Backster has apparently made no attempt to do so. Instead, he has relied on visual inspection of the polygraph record to pick out points at which the deflections of the pen from the baseline are noticeable. Although such subjective judgment is scientifically unacceptable, the deflections that he uses in his examples seem sufficiently marked that they probably can be considered to be real deviations from the baseline. At any rate, let us assume that responses on the polygraph record can be visually pinpointed with reasonable objectivity. The deflections on the polygraph record are then compared with happenings on the concurrent videotaping of the conversation with the subject. Here we encounter very serious problems as to what constitutes an emotional response on this behavioral record. Backster believes he can identify categories of potentially emotionally arousing stimuli in the nonstandardized, qualitative, ongoing record of conversation. He then can determine if the subject was experiencing an emotional reaction to such a stimulus by simply replaying the record, pointing to the segment that corresponds to a place where the polygraph showed a deflection, and asking the subject if he or she recalls what was taking place at that moment as an emotionally arousing experience. If the subject agrees, this is said to confirm a "correlation" between the emotional state and the corresponding activity of the tracing. Such a purely subjective determination of an emotional response opens
PARANORMAL PHENOMENA 197 the process to a variety of known biases, many of them discussed in the paper prepared for the committee by Griffin (Appendix B). The literature on "illusory correlation" (Alloy and Tabachnik, 1984; Griffin paper) makes it clear how subjective expectations and cognitive biases can lead to false impressions of correlation. Backster's method of searching for correlations compounds these inevitable biases: he does not independently determine moments of emotional response in the subject's behavioral record and moments of polygraph deflections and then look for a match between the two. Instead, he apparently looks for polygraph deflections and then tries to determine if an emotional response can be found that occurred in the vicinity of the polygraph activity. In other words, the determination of the emotional response is done with full knowledge of the fact that a polygraph deflection has occurred. Under such circumstances, we would expect processes of subjective validation to operate. In addition, the method of verifying the emotional response, by asking the subject to acknowledge that he or she was in fact experiencing such a state at the moment the polygraph record indicated a leukocyte response, is itself suspect. This is the sort of circumstance in which demand characteristics (i.e., responses determined by the presumed intent of the experimenters) are known to operate. Good science dictates that the moments of emotional response should be determined independently of the moments of polygraph response. Both the experimenter and the subject must be blind to the polygraph record when determining the moments of emotional response. Only when the determination of events on the two records has been made independ- ently of each other can the records be compared to determine if the emotional responses and the polygraph activity are correlated. Illusory correlations occur because our subjective judgments of cov- ariation tend to use only a portion of the relevant information and because we tend to bias observed events in terms of our expectations. In particular, intuitive judgments of covariation tend to focus only on the co-occurrence of treatment of interest and successful outcomes, ignoring times when the treatment co-occurred with unsuccessful outcomes. Backster uses only those examples from his records in which an emotional response co-occurs with a polygraph deflection; the 12 such examples from the 7 experimental series represent a very small fraction of the total data collected. Not only is a sample of just 12 co-occurrences probably too small for estimating whether a true correlation exists, but it is also impossible from this information alone to estimate whether any correlation exists. All the data are needed for this purpose. Almost certainly, more than 12 polygraph deflections must have appeared in the total record. In the brief demon- stration for the committee, both the control and the experimental series
198 yielded several deflections, so it ENHANCING HUMAN PERFORMANCE is reasonable to assume that many more than 12 deflections were obtained in the complete record. It is likely that these unreported deflections were not preceded by any emotional re sponses. Almost certainly, more than 12 emotional responses must have appeared in the total record. The point of conducting the sessions was to expose the subjects to a variety of emotional stimuli; therefore, it is essential to know the number of times that emotional responses occurred without the corresponding occurrence of polygraph responses. Finally, to determine correlation, it is essential to know the frequency of co-occurrence of the absence of emotional responses and the absence of polygraph responses. All this information is needed to determine whether the claimed correlation exists. All the data must be used. From these data, one can compare the proportion of times that an emotional response is followed by a polygraph response with the proportion of times that the absence of an emotional response is followed by a polygraph response. Only if these two proportions are significantly different from one another can we assume that the data provide evidence for a correlation between emotional response and leukocyte activity. The fact that Backster was able to find 12 examples of the co-occurrence between emotional response and polygraph deflection, even if these correspondences had come from double-blind matching, provides us with absolutely no information about whether a correlation exists. The stronger claim would be, of course, not that a correlation exists, but that a causal connection exists between the subject's emotional states and the responses of the detached leukocytes. As Chapter 3 on evaluation indicates, such a causal explanation requires much more than the demonstration of correlation between two series. Because Backster did not use double-blind procedures to determine emotional responses, and because the procedures he did use are known to be just those that facilitate the occurrence of a variety of subjective biases, he may well' have obtained a correlation between his two series. However, his procedures for finding such correlations are sufficiently flawed that we do not know if in fact the suspected (and presumably biased) correlation actually does exist in his data. The Backster experiment indicates that the best intentions combined with scientific instrumentation and poly- graphic records cannot, in themselves, guarantee data of scientific quality. DISCUSSION 0F THE SCIENTIFIC EVIDENCE Both the parapsychologists cited in this report and the critics of parapsychology believe that the best contemporary experiments in para- psychology fall short of acceptable methodological standards. The critics
PARANORMAL PHENOMENA 199 conclude that such data, based on methodologically flawed procedures, cannot justify any conclusions about psi. The parapsychologists argue that, while each experiment is individually flawed, when taken together they justify the conclusion that psi exists. Palmer's conclusion in this regard is unique. Although he agrees that the data do not justify the conclusion that a paranormal phenomenon has been demonstrated, he argues that the data, with all their drawbacks, do justify the conclusion that an anomaly of some sort has been demonstrated. It is this purported demonstration of an anomaly that, according to Palmer, further justifies the claim that parapsychologists do have a subject matter. The awkward aspect of Palmer's position is that, without an adequate theory, there is no way to know that the anomaly "demon- strated" in one experiment is the same anomaly "demonstrated" in another; indeed, there is no limit to the possible causes of the anomaly in a given experiment. Without an adequate theory, there is no reason to assume that the various anomalies constitute a coherent or intelligibly related class of phenomena. The committee distinguishes among three types of criticism that can be leveled at a given parapsychological finding. The first is what we might refer to as the smoking gun. This type of criticism asserts or strongly implies that the observed findings were due not to psi but to factor X. Such a claim puts the burden of proof on the critic. To back up such a claim, the critic must provide evidence that the results were in fact caused by X. Many of the bitterly contested feuds between critics and proponents have often been the result of the proponent's assuming, correctly or incorrectly, that this type of criticism was being made. The second type of criticism can be referred to as the plausible alternative. In this case, the critic does not assert that the result was due to factor X, but instead asserts that the result could have been due to factor X. Such a stance also places a burden on the critic, but one not so stringent as the smoking gun assertion. The critic now has to make a plausible case for the possibility that factor X was sufficient to have caused the result. For example, optional stopping of an experiment on the part of a subject can bias the results, but the bias is a small one; it would be a mistake to assert that an outcome was due to optional stopping if the probability of the outcome is extremely low. Akers's critique, which was previously discussed, is an example based on the plausible alternative. The third type of criticism is what we have called the dirty test tube. In this case, the critic does not claim that the results have been produced by some artifact, but instead points out that the results have been obtained under conditions that fail to meet generally accepted standards. The gist of this type of criticism is that test tubes should be clean when doing
200 ENHANCING HUMAN PERFORMANCE careful and important scientific research. To the extent that the test tubes were dirty, it is suggested that the experiment was not carried out according to acceptable standards. Consequently, the results remain suspect even though the critic cannot demonstrate that the dirt in the test tubes was sufficient to have produced the outcome. Hyman's critique of the Ganzfeld psi research and Alcock's paper on remote viewing and random number generator research are examples of this type of criticism. In the committee's view, it is in this latter sense, the dirty test tube sense, that the best parapsychological experiments fall short. We do not have a smoking gun, nor have we demonstrated a plausible alternative; but we imagine that even the parapsychological community must be concerned that their best experiments still fall far short of the methodo- logical adequacy that they themselves profess. Honorton and Hyman differ on whether to assign a flaw in randomization to a particular series of experiments. With Honorton's assignment, the studies with adequate randomization do not differ in significance of outcome from those with inadequate randomization. With Hyman's assignment, the experiments with inadequate randomization have signif- icantly more successful outcomes than do those with adequate random- ization. A simple disagreement on one experiment can thus make a huge difference as to whether we conclude that this flaw contributed or did not contribute to the observed outcomes. Several similar examples could be cited to illustrate the extreme sensitivity of this data base to slight changes in flaw assignments. Even if Palmer is correct in asserting that in a particular case an anomaly has been demonstrated, serious problems remain. In astronomy and other sciences, an anomaly is a very precise and specifiable departure from a well-defined theoretical expectation. Neptune was discovered, for example, when Leverrier was able to specify not only that the orbit of Uranus departed from that expected by Newtonian theory, but also precisely in what way it departed from expectation. Nothing approaching such a specifiable anomaly has been claimed for parapsychology. A vague and unspecifiable departure from chance is a far cry from a well-described and systematic departure from a precise, theoretical equation. Leverrier's anomaly was consistent with only a very narrow range of possibilities. The sort of anomaly claimed for parapsychology is currently consistent with an almost infinite variety of possibilities, including artifacts of various kinds. THE PROBLEM OF QUALITATIVE EVIDENCE The committee continually encountered the distinction between qual- itative and quantitative evidence for the existence of paranormal phe
PARANORMAL PHENOMENA 201 nomena. Many proponents of the paranormal acknowledge such a differ- ence in one way or another. Some realize that it is only quantitative evidence that will convince the scientific community. Although they themselves have relied on qualitative evidence for their own beliefs, they refer us to the RNG experiments of Robert Jahn or the remote viewing experiments at SRI as examples of supporting quantitative data. Most proponents seem impatient with the request for scientific evidence. They have been convinced through their own experiences or the vivid testimonies of individuals whom they trust. Many argue that qualitative evidence can be as good as quantitative; indeed, they claim that in some circumstances it can be better. The arguments for the superiority of qualitative evidence are based in many cases on such factors as ecological validity, conducive atmosphere, and holism. The ecological validity argument asserts that the artificial conditions required for laboratory experiments are so different from the natural settings in which paranormal phenomena typically occur that findings from such controlled studies are irrelevant. By removing the psychic from his or her natural domain or by arranging conditions to suit the needs of scientific observation, it is claimed, the scientist destroys the very phenomenon under question. The ecological validity argument is closely related to the other arguments. Proponents who emphasize the conducive atmosphere assert that the austere conditions of strict labo- ratory procedure create an atmosphere that is numbing or inimical to psychic functioning. Those who emphasize holism point out that the experimental procedures necessarily dissect and focus on restricted portions of a system. Such compartmentalization, it is claimed, makes it impossible to study the sorts of paranormal phenomena that operate only as a total system in a naturalistic context. QUALITATIVE EVIDENCE AND SUBJECTIVE BIASES What is meant by qualitative evidence? Roughly, it means any sort of nonscientific evidence that proponents find personally convincing. Typ- ically, it involves personally experiencing or witnessing the phenomenon. Less compelling, but still effective, is the testimony of friends or trusted acquaintances who have personally experienced it. Even individuals who are intellectually aware of the pitfalls of personal observation and testimony find it difficult, even impossible, to disregard the compelling quality of such evidence in the formation of their own beliefs. A major parapsychologist admitted to one committee member that the scientific evidence did not justify concluding that psi exists. "As a trained scientist," he said, "I know quite well that by scientific criteria there is no evidence for the existence of psi. In fact, I have always argued with
202 ENHANCING [IUMAN PERFORMANCE my parapsychological colleagues that they are making a serious mistake in trying to get the scientific community to take their current evidence seriously. Before they do this, they first have to be able to collect the sort of repeatable and lawful data that constitute scientific evidence." This same parapsychologist then explained why, despite the current lack of evidence, he remained a parapsychologist. "When I was 16 I had some personal experiences of a psychic nature that were so compelling that I have no doubt that they were real. Yet, as a trained scientist, I know that my personal experiences and subjective convictions cannot and should not be the basis for asking others to believe me." This parapsy- chologist is unusual in that he makes the distinction within himself between beliefs that are subjectively compelling and beliefs that are scientifically justifiable. More typical is the proponent who, as a result of compelling personal experience, not only has no doubt about the reality of underlying paranormal cause, but also has no patience with the refusal of others to support that belief. We see two problems regarding qualitative evidence. First, personal observation and testimony are subject to a variety of strong biases of which most of us are unaware. When such observations and testimony emerge from circumstances that are emotional and personal, the biases and distortions are greatly enhanced. Psychologists and others have found that the circumstances under which such evidence is obtained are just those that foster a variety of human biases and erroneous beliefs. Second, beliefs formed under such circumstances tend to carry a high degree of subjective certainty and often resist alteration by later, more reliable disconfirming data. Such beliefs become self-sealing, in that when new information comes along that would ordinarily contradict them, the believers find ways to turn the apparent contradictions into additional confirmation. The committee asked Dale Griffin to describe many of the ways in which cognitive and social psychologists have documented that human subjective judgment can lead us astray. Griffin's paper emphasizes the cognitive biases termed availability and representativeness, but he also discusses motivational biases. Although most of these biases have been created under laboratory conditions, they are nonetheless quite powerful, and evidence has been mounting that, if anything, they are much more powerful in natural settings. Griffin points out that one vivid, concrete experience is usually sufficient to outweigh conclusions based on hundreds or thousands of cases based on abstract summary statistics. These and the other biases discussed by Griffin should make us wary of conclusions based on qualitative evidence.
PARANORMAL PHENOMENA 203 EXAMPLES OF PROBLEMATIC BELIEFS In this section we discuss some examples of beliefs about paranormal phenomena that have been formed under conditions known to generate cognitive illusions and strong delusional beliefs. We attempt to make clear why we are skeptical of any evidence offered in support of the paranormal that does not strictly fulfill scientific criteria. We believe it is important to realize the power of such conditions to create strong but false beliefs. In 1974 a group of distinguished physicists at the University of London observed renowned psychic Uri Geller apparently bend metallic objects and cause part of a crystal, encapsulated in a container, to disappear. Impressed with what they saw, in 1975 these scientists contributed an article to Nature outlining their ideas about how to conduct successful parapsychological research (reprinted in Hasted et al., 1976~. In their discussion they note that successful results depend on the relation among the participants and that phenomena are more likely to occur when all participants are in a relaxed state, all sincerely want the psychic to succeed, and "the experimental arrangement is aesthetically or imagi- natively appealing to the person with apparent psychokinetic powers.,' Hasted and his colleagues describe further desiderata. The psychic should be treated as one of the experimental team, contributing to an attitude of mutual trust and confidence that facilitates successful appear- ance of the allegedly paranormal effects. The slightest hint of suspicion on the part of the observers can stifle the occurrence of any phenomena. Observers should avoid looking for any particular outcome that interferes with the required relaxed state of mind and impedes paranormal powers. To help avoid the inhibiting effects of concentrated attention, participants should talk and think about matters irrelevant to the experiment at hand. Acknowledging that these desiderata make it difficult to preclude trickery, Hasted and his colleagues express confidence that they can both create psi-conducive conditions and eliminate the possibility of being tricked (Hasted et al., 1976:1941: It should be possible to design experimental arrangements which are beyond any reasonable possibility of trickery, and which magicians will generally acknowledge to be so. In the first stages of our work we did in fact present Mr. Geller with several such arrangements, but these proved aesthetically unappealing to him. Although we may sympathize with the British physicists' desire to create conditions conducive to the appearance of genuine psychic powers, if such powers exist, we cannot fail to note the quandary that their efforts produce. In their quest for psi-conducive conditions, they have created guidelines that play into the hands of anyone intent on deceiving them.
204 ENHANCING HUMAN PERFORMANCE The very conditions that are specified as being conducive to the appearance of paranormal phenomena are almost always precisely those that are conducive to the successful performance of conjuring tricks. One of the first rules the aspiring conjuror learns is never to announce in advance the specific outcome that he or she is going to produce. In this way onlookers will not know where and on what they should focus their attention and consequently will be less apt to detect the method by which the trick was accomplished. The authors' advice to avoid focusing on a predetermined outcome greatly facilitates the conjuror's task. The insistence that the arrangements meet with the psychic's approval is by far the most devastating of these conditions. Geller will perform only if the conditions are "aesthetically pleasing." This amounts to giving the alleged psychic complete veto power over any situation in which he or she feels that success is not ensured. This in turn means that the psychic being tested, not the experimenters, is controlling the experiment. Surely the British physicists ought to realize the irony of their admission that all their experimental arrangements designed to preclude trickery turned out to be aesthetically unacceptable to Uri Geller. Another example of beliefs generated in circumstances that are known to create cognitive illustions is macro-PK, which is practiced at spoon- bending, or PK, parties. The 15 or more participants in a PK party, who usually pay a fee to attend and bring their own silverware, are guided through various rituals and encouraged to believe that, by cooperating with the leader, they can achieve a mental state in which their spoons and forks will apparently soften and bend through the agency of their minds. Since 1981, although thousands of participants have apparently bent metal objects successfully, not one scientifically documented case of paranormal metal bending has been presented to the scientific community. Yet participants in the PK parties are convinced that they have both witnessed and personally produced paranormal metal bending. Over and over again we have been told by participants that they know that metal became paranormally deformed in their presence. This situation gives the distinct impression that proponents of macro-PK, having consistently failed to produce scientific evidence, have forsaken the scientific method and undertaken a campaign to convince themselves and others on the basis of clearly nonscientific data based on personal experience and testimony obtained under emotionally charged conditions. Consider the conditions that leaders and participants agree facilitate spoon bending. Efforts are made to exclude critics because, it is asserted, skepticism and attempts to make objective observations can hinder or prevent the phenomena from appearing. As Houck, the originator of the PK party, describes it, the objective is to create in the participants a
PARANORMAL PHENOMENA 205 peak emotional experience (Houck, 19841. To this end, various exercises involving relaxation, guided imagery, concentration, and chanting are performed. The participants are encouraged to shout at the silverware and to "disconnect" by deliberately avoiding looking at what their hands are doing. They are encouraged to shout Bend! throughout the party. "To help with the release of that initial concentration, people are encouraged to jump up or scream that theirs is bending, so that others can observe." Houck makes it clear that the objective is to create a state of emotional chaos. 'iShouting at the silverware has also been added as a means of helping to enhance the emotional level in a group. This procedure adds to the intensity of the command to bend and helps create pandemonium throughout the party." A PK party obviously is not the ideal situation for obtaining reliable observations. The conditions are just those which psychologists and others have described as creating states of heightened suggestibility and implanting compelling beliefs that may be unrelated to reality. It is beliefs acquired in this fashion that seem to motivate persons who urge us to take macro-PK seriously. Complete absence of any scientific evidence does not discourage the proponents; they have acquired their beliefs under circumstances that instill zeal and subjective certainty. Unfortu- nately, it is just these circumstances that foster false beliefs. DISCUSSION OF QUALITATIVE EVIDENCE Our analysis of the evidence put before us indicates that even the most solidly based arguments for the existence of paranormal phenomena fall short of the currently accepted parapsychological standards. Even if the best evidence had been collected according to acceptable scientific standards, most proponents would have in fact remained convinced by personal experiences and data that clearly fall far short of scientific acceptability. We have looked at two examples to make clear why and in what ways such failures to meet acceptable standards render the corresponding arguments useless as evidence for the paranormal, even though they have created compelling and strongly held beliefs in those who have been exposed to them. The examples illustrate how different ways of attempting to acquire evidence for paranormal phenomena can depart from adequate standards. These inadequacies become especially critical when we note that the conditions under which the alleged paranormal phenomena are supposed to occur are just those known to foster biases and false beliefs. The PK parties, while creating powerful beliefs in paranormal metal bending, clearly violate almost every principle for obtaining trustworthy data. These parties offer no standardization, no objective records, and no
206 ENHANCING HUMAN PERFORMANCE controls against self-deception or the deliberate deception of others. All participants, including the leader, are encouraged to achieve a peak emotional state, and general chaos is encouraged. The suggestions of a group of British physicists for testing alleged psychics are aimed at somehow combining the desire to keep the psychic from feeling inhibited with the desire to obtain evidence of acceptable scientific quality. The observers' zeal for making the psychic feel trusted produces conditions that make scientific observation impossible: observ- ers are instructed to refrain from focusing attention on any expected result, and the experimental arrangement must be aesthetically acceptable to the psychic, a condition that in effect puts the psychic in control of the experiment. The search for psi-conducive conditions is understandable. Parapsy- chological research, even at its best, has been continually frustrated by the lack of robust, lawful, and repeatable outcomes, yet parapsychologists have experienced phenomena or have encountered data that have con- vinced them of the reality of the paranormal. When they try to put such evidence before their critics, however, the phenomena have a habit of disappearing. If one fervently believes that the phenomena are real, then it becomes easy to imagine a variety of reasons why they are elusive and hard to produce on demand. When proponents encounter a new phenomenon or psychic, they are strongly motivated to create conditions that will not drive the phenomenon away. The special atmosphere of PK parties and the suggestions of the British physicists are just two examples of attempts to generate psi- conducive conditions that also seem to be deception-conducive and bias- conducive. CONCLUSIONS In drawing conclusions from our review of evidence and other consid- erations related to psychic phenomena, we note that the large body of research completed to date does not present a clear picture. Overall, the experimental designs are of insufficient quality to arbitrate between the claims made for and against the existence of the phenomena. While the best research is of higher quality than many critics assume, the bulk of the work does not meet the standards necessary to contribute to the knowledge base of science. Definitive conclusions must depend on evidence derived from stronger research designs. The points below summarize key arguments in this chapter. 1. Although proponents of ESP have made sweeping claims, not only for its existence but also for its potential applications, an evaluation of the best available evidence does not justify such optimism. The strongest
PARANORMAL PHENOMENA 207 claims have been made for remote viewing and the Ganzfeld experiments. The scientific case for remote viewing is based on a relatively small number of experiments, almost all of which have serious methodological defects. Although the first experiments of this type were begun in 1972, the existence of remote viewing still has not been established. Further- more, although success rates varying from 30 to 60 percent have been claimed for the Ganzfeld experiments, the evidence remains problematic because all the experiments deviate in one or more respects from accepted scientific procedures. In the committee's view, the best scientific evidence does not justify the conclusion that ESP that is, gathering information about objects or thoughts without the intervention of known sensory mechanisms-exists. 2. Nor does scientific evidence offer support for the existence of psychokinesis that is, the influence of thoughts upon objects without the intervention of known physical processes. In the experiments using random number generators, the reported size of effects is very small, a hit rate of no more than 50.5 percent compared with the chance expectancy of 50 percent. Although analysis indicates that overall significance for the experiments, with their unusually large number of trials, is probably not due to a statistical fluke, virtually all the studies depart from good scientific practice in a variety of ways; furthermore, it is not clear that the pattern of results is consistent across laboratories. In the committee's view, any conclusions favoring the existence of an effect so small must at least await the results of experiments conducted according to more adequate protocols. 3. Should the Army be interested in evaluating further experiments, the following procedures are recommended: first, the Army and outside scientists should arrive at a common protocol; second, the research should be conducted according to that protocol by both proponents and skeptics, and third, attention should be Riven to the manipulability and practical application of any effects found. Even if psi phenomena are determined to exist in some sense, this does not guarantee that they will have any practical utility, let alone military applications. For this to be possible, the phenomena would have to obey causal laws and be manipulable. 4. The committee is aware of the discrepancy between the lack of scientific evidence and the strength of many individuals' beliefs in r ~ ~ T _. ~ 1 ] . . ~ ~ _- , paranormal phenomena. lines Is a cause for concern. ray, my ~ . Of the the world's most prominent scientists have concluded that such phenomena exist and that they have been scientifically verified. Yet in just about all these cases, subsequent information has revealed that their convictions were misguided. We also are aware that many proponents believe that the scientific method may not be the only, or the most
208 ENHANCING HUMAN PERFORMANCE appropriate, method for establishing the reality of paranormal phenomena. Unfortunately, the alternative methods that have been used to demonstrate the existence of the paranormal create just those conditions that psy- chologists have found enhance human tendencies toward self-deception and suggestibility. Concerns about making the experimental situation comfortable for the alleged psychic or conducive to paranormal phenom- ena frequently result in practices that also increase opportunities for deception and error. SOURCES OF INFORMATION Two of the military officers who briefed us during our first meeting urged the committee to give serious consideration to paranormal phe- nomena and related parapsychological techniques. They described a variety of such phenomena that they felt had military potential, either as threats to security or as aids to defense. Site visits to leading laboratories and a paper prepared for the committee also contributed to the bases for the committee's work. Briefings were given to committee members by Robert Jahn, Cleve Backster, Helmut Schmidt, members of the staff of the Stanford Research Institute, and the U.S. Army Laboratory Command in Adelphi, Maryland. The paper prepared by James Alcock provided detailed reviews of the available evidence on random event generators and remote viewing. In addition, the committee benefited from a thorough review conducted for the Army Research Institute by John Palmer and from its own review of recent articles in the Journal of Parapsychology and other relevant periodicals and handbooks.