©Alexmax | Dreamstime.com (#17204361)
Blithe accusations of “I found serious methodological errors” provide an easy justification for dismissing this remarkable body of research.
About the Author
Articles in This Issue
Getting the Facts Straight: Dean Radin Responds to a Skeptic’s Conviction
Professor Daryl Bem, a prominent psychologist from Cornell University (now retired), will soon publish an article in the Journal of Personality and Social Psychology, a top-ranked, mainstream psychology journal. The article reports nine experiments involving 1,000 subjects, each study investigating an aspect of precognition – perception of the future. His approach was to take experiments commonly used in social psychology and add a clever twist to reverse the usual cause-effect sequence. Eight of the experiments resulted in statistically significant outcomes, and the combined results of the nine studies were astronomically significant, with odds against chance far beyond a million to one.
That a well-regarded journal would dare publish this article has outraged a few scientists, leading them to a great gnashing of teeth and worries about unleashing awave of superstitious nonsense. The gnashing has been covered in dozens of media outlets, and it finally bubbled up to the attention of the mainstream media, including the New York Times and NPR. A webpage on the “World of Parapsychology” site has been keeping track of the worldwide media attention. Some of the op-eds border on the hysterical. Others are more rational. The comments to the op-eds show the same polarization: Some scientists froth with anger over the mere possibility that precognition might be real, others call for tolerance when entertaining new ideas, while the general public can’t figure out what all the fuss is about.
One of the frothier tomes, the Skeptical Inquirer, published an online commentary on Bem’s article written by uber-skeptic James Alcock, professor of psychology at York University in the UK. I will not address Alcock’s critique of Bem’s procedure (Bem does that calmly and effectively), but I will comment on his preamble, which I reproduce here in portions. Alcock writes:
Parapsychology has long struggled unsuccessfully for acceptance in the halls of science. Could this article be the breakthrough? After all, it apparently provides evidence compelling enough to persuade the editors of that APA journal of its worthiness. However, this is hardly the first time that there has been media excitement about new “scientific” evidence of the paranormal. Over the past 80-odd years, this drama has played out a number of times, and each time, parapsychologists ultimately failed to persuade the scientific world that their phenomena actually exist. Recalling George Santayana’s now-clichéd dictum, “Those who cannot remember the past are condemned to repeat it,” we should approach Bem’s work with a historical framework to guide us.
Alcock fails to mention that one reason for the “failure to persuade” are myths about psi research that are incessantly repeated by skeptics whose inability to accept anything new makes it impossible for any new evidence to sway their original beliefs. The strategy of repeating the same talking points ad infinitum is effective because most listeners eventually absorb those words and assume that they are true (a lie told often enough usually becomes accepted as truth). A genuine skeptic would wonder if Alcock’s critique is backed up by solid facts. After doing some homework, he or she would discover that most of it isn’t. Alcock continues:
Consider the following: In 1934, Joseph Banks Rhine published Extra-Sensory Perception (Rhine & McDougall, 1934/2003), summarizing his careful efforts to bring parapsychology into the laboratory through application of modern psychological methodology and statistical analysis. Based on a long series of card-guessing experiments, he wrote: “It is independently established on the basis of this work alone that Extra-Sensory Perception is an actual and demonstrable occurrence” (p. 210). Elsewhere, he wrote: “We have then, for physical science, a challenging need for the discovery of the energy mode involved. Some type of energy is inferable, and none is known to be acceptable . . .” (p.166). Despite Rhine’s confidence that he had established the reality of extrasensory perception, he had not done so. Methodological problems eventually came to light, and as a result, parapsychologists no longer run card-guessing studies, and rarely even refer to Rhine’s work.
This sounds like a slam dunk, except that Alcock’s perception of history is distorted. In Rhine’s 1940 book, Extra-Sensory Perception after Sixty Years, which refers to the period 1880 to 1940, Rhine and his coauthors discussed in great detail every critique their work had received and how potential design and analytical loopholes were addressed in their subsequent experiments. They also listed all known replications of their card-guessing method, dozens of them conducted at universities around the world. This volume makes it clear that Rhine and his colleagues were every bit as methodologically sophisticated and as hard-nosed as their harshest critic and that their data – when viewed under the most critical light available at the time – withstood those critiques.
The key reason why Rhine’s work failed to sustain the initial excitement it had generated in the 1930s was the rise of behaviorism in academic psychology. Within that paradigm, not only was ESP considered to be impossible, but any form of subjective experience, including conscious awareness itself, became a forbidden topic. And the reason few researchers today use ESP cards is not because the method was flawed but because better methods have been developed. Like any other area of research, methods and ideas naturally evolve and build upon the work of previous generations. Alcock continues:
Physicist Helmut Schmidt conducted numerous studies throughout the 1970s and 1980s that putatively demonstrated that humans (and animals) could paranormally influence and/or predict the output of random event generators. Some of his claims were truly extraordinary – for example, that a cat in a cold garden shed, heated only by a lamp controlled by a random event generator, was able through psychokinetic manipulation of the random event generator to turn the lamp on more often than would be expected by chance. His claim to have put psi on a solid scientific footing garnered considerable attention, and his published research reported very impressive p-values. In my own extensive review of his work (Alcock, 1988), I concluded that Schmidt had indeed accumulated impressive evidence that something other than chance was involved. However, I found serious methodological errors throughout his work that rendered his conclusions untenable, and the “something other than chance” was attributable to methodological flaws.
As with Rhine, excitement about Schmidt’s research gradually dwindled to the point that his work became virtually irrelevant, even within parapsychology itself.
Blithe accusations of “I found serious methodological errors” provide an easy justification for dismissing this remarkable body of research. Is it really true that Alcock’s dissection of Schmidt’s work made that research irrelevant or that other researchers did not follow up his work? No. The assertion is so wrong that it verges on breathtaking. Hundreds of experiments involving random number generators (RNG) were published after Schmidt's studies, and meta-analyses of those experiments have been published and debated in mainstream physics and psychology journals. Schmidt’s work inspired dozens of researchers to replicate and extend his work, and it continues to do so today in research programs like the Global Consciousness Project. His work also led to several psi-related patents. Alcock charges ahead:
The 1970s gave rise to “remote viewing,” a procedure through which an individual seated in a laboratory could supposedly receive psychic impressions of a remote location being visited by someone else. Physicists Russell Targ and Harold Puthoff claimed that their series of remote viewing studies demonstrated the reality of psi. This attracted huge media attention, and their dramatic findings (Targ & Puthoff, 1974) were published in Nature, one of the world’s top scientific journals. At first, their methodology seemed unassailable, but years later, when more detailed information became available, it became obvious that there were fundamental flaws in their procedure that could readily account for their sensational findings. When other researchers repeated their procedure with the flaws intact, significant results were obtained; with flaws removed, outcomes were not significant (Marks & Kamman, 1978; 1980).
Add Targ and Puthoff to the list of “breakthrough” researchers whose work is now all but forgotten.
Did obvious flaws adequately account for the results of remote viewing studies? No. Were those study designs abandoned? No. Did skeptics like Ray Hyman, who reviewed a tiny subset of the SRI/SAIC remote viewing studies for the CIA, conclude that the studies were flawed? No. Did this research paradigm, which was an updated version of picture-drawing techniques developed a half-century earlier, disappear? The answer, again, is no.
Targ and Puthoff, and later Ed May and colleagues, not only continued to conduct substantial research on remote viewing, but it proved to be so useful for gathering information in a unique way that it was ultimately used for thousands of operational missions by the Department of Defense. Some portions of the history of the formerly secret Stargate Program (and other projects with different code names) are in the public domain now, so it isn’t necessary to go into that here. Suffice it to say that those research programs were very carefully monitored by skeptical scientific oversight committees, who continued to recommend funding for more than two decades (as long as the program remained secret). Alcock continues:
In 1979, Robert Jahn, then Dean of Engineering and Applied Science at Princeton University, established the Princeton Engineering Anomalies Research unit to study putative paranormal phenomena such as psychokinesis. Like Schmidt, he was particularly interested in the possibility that people can predict and/or influence purely random subatomic processes. Given his superb academic and scientific credentials, his claims of success drew particular attention within the scientific community. When his laboratory closed in 1970, Jahn concluded that: “Over the laboratory’s 28-year history, thousands of such experiments, involving many millions of trials, were performed by several hundred operators. The observed effects were usually quite small, of the order of a few parts in ten thousand on average, but they compounded to highly significant statistical deviations from chance expectations.” However, parapsychologists themselves were amongst the most severe critics of his work, and their criticisms were in line with my own (Alcock, 1988). More important, several replication attempts have been unsuccessful (e.g., Jeffers, 2003), including a large-scale international effort led by Jahn himself (Jahn et al., 2000).
One more name for the failed-breakthrough list.
Other than the flub about a lab closing in 1970 that opened in 1979, again we see a cavalier dismissal of decades of research, implying that the work was systematically sloppy or methodologically naive or both. Nothing can be further from the truth. I was at Princeton for three years and spent enough time in the PEAR Lab to know that the research conducted there was as rigorously vetted and executed as any scientific project you will find anywhere. There weren’t just “several replications.” There were hundreds. The PEAR Lab’s RNG research replicated and extended Helmut Schmidt’s studies, and their remote perception research replicated and extended the SRI/SAIC remote viewing research. PEAR successfully and independently replicated both of those study designs, again and again. Even Jeffers, who Alcock cites to suggest that the PEAR RNG work could not be replicated, was later involved in a successful RNG experiment.
And yes, it’s true: parapsychologists criticize each others’ work. Why wouldn’t they? As in any scientific discipline, those who know the most about a subject are also the most qualified to provide critiques. These debates are healthy and necessary for refining methods and interpretations, and such critiques can be found in virtually all areas of science and scholarship. It does not mean that colleagues are suggesting a wholesale dismissal of the evidence, as devout skeptics are wont to do. Alcock continues:
In the 1970s, the Ganzfeld, a concept borrowed from contemporaneous psychological research into the effects of sensory deprivation, was brought into parapsychological research. Parapsychologists reasoned that psi influences may be so subtle that they are normally drowned out by information carried through normal sensory channels. Perhaps if a participant were in a situation relatively free of normal stimulation, then extrasensory information would have a better opportunity to be recognized. The late Charles Honorton carried out a large number of Ganzfeld studies and claimed that his meta-analysis of such work substantiated the reality of psi. Hyman (1985) carried out a parallel meta-analysis, which contradicted that conclusion. Hyman and Honorton (1986) subsequently published a “Joint Communiqué” in which they agreed that the Ganzfeld results were not likely to be due to chance but that replication involving more rigorous standards was essential before final conclusions could be drawn.
Daryl Bem subsequently published an overview of Ganzfeld research in the prestigious Psychological Bulletin (Bem & Honorton, 1994), claiming that the accumulated data were clear evidence of the reality of paranormal phenomena. That effort failed to convince, in part because a number of meta-analyses have been carried out since, with contradictory results (e.g., Bem, Palmer & Broughton, 2001; Milton & Wiseman, 1999). Recently, the issue was raised again in the pages of Psychological Bulletin, with papers from Storm et al. (2010) and Hyman (2010). While the former argued that their meta-analyses demonstrate paranormal influences, Hyman pointed to serious shortcomings in their analysis and reminded us that the Ganzfeld procedure has failed to yield data that are capable of being replicated by neutral scientists.
Because of the lack of clear and replicable evidence, the Ganzfeld procedure has not lived up to the promise of providing the long-sought breakthrough that would lead to acceptance by mainstream science.
Add Honorton (and Bem, first time around) to the list.
This historical review might fit the facts for Alcock-world, but it doesn’t match what actually happened. It’s quite true that Honorton and Hyman agreed that the results available in 1986 were not due to chance and that confirmations with new data would be required to be persuasive, but Honorton subsequently provided a highly successful replication with new data, and Bem and Honorton published it in 1994.
Now we learn that this successful replication was not persuasive after all because of meta-analyses appearing six years later that presented “contradictory” results. Besides the retrocausal reason for dismissing a successful replication, is it really true that the meta-analyses were contradictory or that avowed skeptics cannot successfully replicate the effect? No, it is not. The Milton and Wiseman analysis was flawed. When proper methods were employed, their meta-analysis produced a statistically significant positive outcome. In fact, of the half-dozen meta-analyses of the Ganzfeld database published to date, every single one finds significantly positive evidence (this is discussed in the online journal NeuroQuantology in an article by Tressoldi, Storm, and Radin). So rather than being contradictory, the existing Ganzfeld database is actually completely consistent. In addition, skeptics have successfully repeated the Ganzfeld experiment (it is not obvious from that article’s title or abstract, but it is described in the paper itself).
What is the lesson from this history? It is that one should give pause when presented with new claims of impressive evidence for psi. Early excitement is often misleading, and as Ray Hyman has pointed out, it often takes up to 10 years before the shortcomings of a new approach in parapsychological research become evident.
In other words, Alcock suggests that we don’t need to pay attention to new experimental data because it will someday, eventually, maybe, be shown to be flawed in some way. If such promissory dismissals were regularly applied to any other area of science, everything would come to a grinding halt. No new findings would ever appear because research methods continually evolve and today’s data and analyses are never going to be as good as tomorrow’s.
One must also keep in mind that even the best statistical evidence cannot speak to the causes of observed statistical departures. Statistical deviations do not favour arbitrary pet hypotheses, and statistical evidence cited in support of psi could as easily support other hypotheses as well. For example, if one conducted a parapsychological experiment while praying for above-chance scoring, statistically significant outcomes could be taken as evidence for the power of prayer just as readily as for the existence of psi.
Yes, there can be many interpretations of experimental results. It is the investigators’ job to devise methods that as clearly as possible distinguish between possible explanations. But it is not necessary to have an explanation for observed results. In fact, in science, data must take priority over explanatory models. If theory is allowed to trump observation, then science collapses into dogma. From his screed, it seems that Alcock is more comfortable with dogma than with data.
. . . Obvious methodological or analytical sloppiness indicates that the implicit social contract has been violated and that we can no longer have confidence that the researcher followed best practices and minimized personal bias. As Gardner (1977) wrote, when one finds that the chemist began with dirty test tubes, one can have no confidence in the chemist’s findings, and one must wonder about other, as yet undetected, contamination. So, when considering the present research, we need not only to look at the data, but, following the metaphor, we need to assess whether Bem used clean test tubes.
Of course. This is elementary. Bem is a well-regarded experimentalist. The journal editor and the four referees who vetted his article are well aware of the dirty test tube issue. In fact, everyone who has ever conducted an experiment knows this. Could it be that Alcock is not aware just how elementary it is? Surely this cannot be the case. He was the expert who critiqued the random number generator experiments, he is adept at finding obvious flaws, and now he deftly dissects obvious problems in Bem’s experiments that others could not find. Alcock must be a highly accomplished experimentalist to have honed this skill. Let us check the online bibliographies, which archive virtually all of the peer-reviewed scientific and scholarly publications, to learn his secret. Hmm, there are no publications reporting experiments by Alcock. Not one.
And so now what’s going on here becomes clear: Only armchair critics insist that we need a perfect experiment before we can accept a surprising result. The clean test tube metaphor is a nice ideal, but in the real world there is no such thing. Nevertheless, the ideal gives the armchair critic a convenient reason to dismiss any result he or she prefers to disbelieve. If no actual flaw can be identified, then just raise suspicions or propose implausible scenarios that never actually occurred. And then just keep repeating them loudly until you intimidate others into accepting the fantasy.
In sum, for those who aren’t familiar with this research domain, Alcock’s critique – as well as many others one can find about Bem’s article – may appear reasonable at first glance. But if you do know the pertinent history, the researchers involved, the laboratories that have conducted this research, and the relevant peer-reviewed scientific literature, then the critiques’ veneers fall away and they can be seen for what they really are: angst, bluster, intolerance, and a failure of imagination.