Philosophy-Reason - Meetup

W
Shared by: linxiaoqin
Categories
Tags
-
Stats
views:
0
posted:
1/26/2013
language:
English
pages:
15
Document Sample
scope of work template
							CONVERSATION
The Argumentative Theory
A Conversation with Hugo Mercier [4.27.11]
http://www.edge.org/




Last July, opening the Edge Seminar, "The New Science of Morality", Jonathan Haidt digressed to talk
about two recently-published papers in Behavioral and Brain Sciences which he believed were "so
important that the abstracts from them should be posted in psychology departments all over the
country."


One of the papers "Why Do Humans Reason? Arguments for an Argumentative Theory," published by
Behavioral and Brain Sciences, was by Hugo Mercier and Dan Sperber.


"The article,” Haidt said, "is a review of a puzzle that has bedeviled researchers in cognitive
psychology and social cognition for a long time. The puzzle is, why are humans so amazingly bad at
reasoning in some contexts, and so amazingly good in others?"


"Reasoning was not designed to pursue the truth. Reasoning was designed by evolution to help us win
arguments. That's why they call it The Argumentative Theory of Reasoning. So, as they put it, "The
evidence reviewed here shows not only that reasoning falls quite short of reliably delivering rational
beliefs and rational decisions. It may even be, in a variety of cases, detrimental to rationality.
Reasoning can lead to poor outcomes, not because humans are bad at it, but because they




                                                    1
systematically strive for arguments that justify their beliefs or their actions. This explains the
confirmation bias, motivated reasoning, and reason-based choice, among other things."


"Now, the authors point out that we can and do re-use our reasoning abilities. We're sitting here at a
conference. We're reasoning together. We can re-use our argumentative reasoning for other purposes.
But even there, it shows the marks of its heritage. Even there, our thought processes tend towards
confirmation of our own ideas. Science works very well as a social process, when we can come
together and find flaws in each other's reasoning. We can't find the problems in our own reasoning
very well. But, that's what other people are for, is to criticize us. And together, we hope the truth
comes out."


Dan Sperber, an influential French social and cognitive scientist, is widely recognized as being among
the most brilliant cognitive scientists writing about reason, language, culture, and human evolution.
Hugo Mercier, his former student, is a post-doc at University of Pennsylvania and coauthor with
Sperber of a number of papers.


Their Argumentative Theory has already generated much excitement in the academic community.
Reaction — from heated rejection to enthusiastic acceptation — have never been indifferent. The
paper has created a storm of interest and controversy and has has attracted attention well beyond
academic circles. Sharon Begley (Newsweek) and Jonah Lehrer (Wired) were among the many
journalists who wrote stories. In addition, many leading thinkers have taken note.


Gerd Gigerenzer finds this view on reasoning is most provocative as "reasoning is not about truth but
about convincing others when trust alone is not enough. Doing so may seem irrational, but it is in fact
social intelligence at its best." Steven Pinker notes that "The Argumentative Theory is original and
provocative, has a large degree of support, and is strikingly relevant to contemporary affairs, including
political discourse, higher education, and the nature of reason and rationality. It is likely to have a big
impact on our understanding of ourselves and current affairs."


And Jonathan Haidt says the “the article is one of my favorite papers of the last ten years. I believe
that they have solved one of the most important and longstanding puzzles in psychology: why are we
so good at reasoning in some cases, but so hopelessly biased in others? Once I read their paper, I saw
the argumentative function" of reasoning everywhere — particularly in the reasoning of people I
disagreed with, but also occasionally even in myself. They're on to a very powerful idea with many
social and educational ramifications."


— JB



HUGO MERCIER is a cognitive scientist and a post-doctoral fellow at the University of Pennsylvania in
Philadelphia. He did his doctoral work under Sperber's supervision on reasoning and argumentation
and he has published many articles on the argumentative theory, several of them in collaboration with
Sperber.




                                                     2
 Why Do Humans Reason? Arguments for an Argumentative Theory

                                               Hugo Mercier
                                         University of Pennsylvania

                                                Dan Sperber
                                      affiliation not provided to SSRN
                           http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1698090




                      Behavioral and Brain Sciences, Vol. 34, No. 2, pp. 57-74, 2011


Abstract:
Reasoning is generally seen as a means to improve knowledge and make better decisions. However,
much evidence shows that reasoning often leads to epistemic distortions and poor decisions. This
suggests that the function of reasoning should be rethought. Our hypothesis is that the function of
reasoning is argumentative. It is to devise and evaluate arguments intended to persuade. Reasoning so
conceived is adaptive given the exceptional dependence of humans on communication and their
vulnerability to misinformation. A wide range of evidence in the psychology of reasoning and decision
making can be reinterpreted and better explained in the light of this hypothesis. Poor performance in
standard reasoning tasks is explained by the lack of argumentative context. When the same problems are
placed in a proper argumentative setting, people turn out to be skilled arguers. Skilled arguers, however,
are not after the truth but after arguments supporting their views. This explains the notorious confirmation
bias. This bias is apparent not only when people are actually arguing but also when they are reasoning
proactively from the perspective of having to defend their opinions. Reasoning so motivated can distort
evaluations and attitudes and allow erroneous beliefs to persist. Proactively used reasoning also favors
decisions that are easy to justify but not necessarily better. In all these instances traditionally described
as failures or flaws, reasoning does exactly what can be expected of an argumentative device: Look for
arguments that support a given conclusion, and, ceteris paribus, favor conclusions for which arguments
can be found.

Number of Pages in PDF File: 55


Cosmic Habituation
Tuesday, May 03, 2011 - 07:20 PM

http://www.radiolab.org/blogs/radiolab-blog/2011/may/03/cosmic-habituation/
In this short, Jonathan Schooler tells us about a discovery that launched his career and led to a puzzle
that has haunted him ever since.

In the late 1980s, when Jonathan Schooler was a graduate student in psychology, he did a little study
that became a big deal. Schooler asked a group of people to watch a video of a man robbing a bank.
After watching the video, he had half of them jot down a description of the robber. And, wait for it ...
turns out that the people who took notes were significantly LESS likely to recognize the robber later.

                                                       3
Not surprisingly, this weird effect -- called "verbal overshadowing" -- got a lot of attention. (In fact, we
talked with Malcolm Gladwell about verbal overshadowing in our show about Choice). But just as
Schooler's big discovery was making it into newspapers and Psych 101 textbooks…something started
happening to his data. Each time he repeated the study, the exact same study, his once attention-
grabbing effect got smaller and smaller.

In this podcast, Schooler tells Jad and Robert about his journey to figure out what had happened to him,
and why it was happening to other scientists too. After considering all the reasonable explanations
(statistical quirks and procedural stumbles), Schooler found himself thinking that maybe, just maybe, the
laws of nature are less solid than they seem.



Unpublished results hide the decline effect
Published online 23 February 2011 | Nature470, 437 (2011) | doi:10.1038/470437a

Some effects diminish when tests are repeated. Jonathan Schooler says being open about
findings that don't make the scientific record could reveal why.

Jonathan Schooler

Many scientifically discovered effects published in the literature seem to diminish with time.
Dubbed the decline effect, this puzzling anomaly was first discovered in the 1930s in
research into parapsychology, in which the statistical significance of purported evidence for
psychic ability declined as studies were repeated. It has since been reported in a string of
fields — both in individual labs (including my own) and in meta-analyses of findings in
biology and medicine. The issue has been recognized in some circles within the scientific
community, but rose to wider prominence last December when it was discussed in an article
in the magazine The New Yorker.

Some scientists attribute the decline effect to statistical self-correction of initially
exaggerated outcomes, also known as regression to the mean. But we cannot be sure of
this interpretation, or even test it, because we do not generally have access to 'negative
results': experimental outcomes that were not noteworthy or consistent enough to pass
peer review and be published.

How could the availability of unpublished results be improved? I suggest an open-access
repository for all research findings, which would let scientists log their hypotheses and
methodologies before an experiment, and their results afterwards, regardless of outcome.
Such a database would reveal how published studies fit into the larger set of conducted
studies, and would help to answer many questions about the decline effect.

Availability of unpublished findings could also address other shortcomings of the current
scientific process, including the regular failure of scientists to report experiments, conditions
or observations that are inconsistent with hypotheses; the addition or removal of
participants and variables to generate statistical significance; and the probable existence of
numerous published findings whose non-replicability is shrouded because it is difficult to
report null results.



                                                      4
To address the decline effect, such a database could pinpoint whether the phenomenon
reflects how scientists design experiments, how they write them up or how journals decide
what to publish. It could be used to explore whether genuine changes in studied phenomena
could stem from conventional mechanisms; for example, in social sciences, decline effects
could be the result of participants no longer being naive about the effect under
investigation. Less likely, but not inconceivable, is an effect stemming from some
unconventional process. Perhaps, just as the act of observation has been suggested to
affect quantum measurements, scientific observation could subtly change some scientific
effects. Although the laws of reality are usually understood to be immutable, some
physicists, including Paul Davies, director of the BEYOND: Center for Fundamental Concepts
in Science at Arizona State University in Tempe, have observed that this should be
considered an assumption, not a foregone conclusion.

More prosaic explanations for the decline effect include the previously mentioned regression
to the mean. If early results are most likely to be reported when errors combine to magnify
the apparent effect, then published studies will show systematic bias towards initially
exaggerated findings, which are subsequently statistically self-corrected (although this
would not account for the typically linear nature of the decline).

Publication bias could also be responsible. Researchers might only be able to publish initial
findings on an effect when it is especially large, whereas follow-up studies might be more
able to report smaller effects. Other potential answers include unreported aspects of
methods, exclusive reporting of findings consistent with hypotheses, changes in researcher
enthusiasm, more rigorous methodologies used in later studies, measurement error
resulting from experimenter bias and the general difficulty of publishing failures of
replication.

An open-access database of research methods and published and unpublished findings
would go a long way towards testing these ideas. For example, both the regression to the
mean and degradation of procedure explanations assume that early published studies
benefit from being at one statistical end of a larger body of (unpublished) findings.
Publication bias and selective reporting of data are similarly difficult to investigate without
knowing about unpublished data.

An open-access repository of findings would be difficult to introduce. It would need an
automated protocol to enable study methods and results to be entered and retrieved. Some
way to assess the quality of the work would be required — perhaps through open-access
commentaries moderated in a manner similar to Wikipedia. We would need to assure the
qualifications of researchers who use it, and maintain a blackout period to protect
hypotheses and findings prior to publication. Reluctant scientists would need incentives —
and perhaps new rules from funders — to take part.

Such challenges would not be insurmountable. Similar, if more narrowly defined, databases
have already been set up for clinical trials (http://clinicaltrials.gov) and educational research
(http://pslcdatashop.web.cmu.edu). A good starting point might be to develop a host of
subject-specific repositories. However it is implemented, we need a better record of
unpublished research before we can know how well the current scientific process, based on
peer review and experimental replication, succeeds in distinguishing grounded truth from
unwarranted fallacy.




                                                5
Jonathan Schooler is a professor of psychology at the University of California, Santa
Barbara. e-mail: schooler@psych.ucsb.edu

Annals of Science


The Truth Wears Off
Is there something wrong with the scientific method?
by Jonah Lehrer December 13, 2010

Many results that are rigorously proved and accepted start shrinking in later studies.


On September 18, 2007, a few dozen neuroscientists, psychiatrists, and drug-company executives
gathered in a hotel conference room in Brussels to hear some startling news. It had to do with a class of
drugs known as atypical or second-generation antipsychotics, which came on the market in the early
nineties. The drugs, sold under brand names such as Abilify, Seroquel, and Zyprexa, had been tested on
schizophrenics in several large clinical trials, all of which had demonstrated a dramatic decrease in the
subjects’ psychiatric symptoms. As a result, second-generation antipsychotics had become one of the
fastest-growing and most profitable pharmaceutical classes. By 2001, Eli Lilly’s Zyprexa was generating
more revenue than Prozac. It remains the company’s top-selling drug.

But the data presented at the Brussels meeting made it clear that something strange was
happening: the therapeutic power of the drugs appeared to be steadily waning. A recent study
showed an effect that was less than half of that documented in the first trials, in the early
nineteen-nineties. Many researchers began to argue that the expensive pharmaceuticals weren’t
any better than first-generation antipsychotics, which have been in use since the fifties. “In fact,
sometimes they now look even worse,” John Davis, a professor of psychiatry at the University of
Illinois at Chicago, told me.

Before the effectiveness of a drug can be confirmed, it must be tested and tested again. Different
scientists in different labs need to repeat the protocols and publish their results. The test of
replicability, as it’s known, is the foundation of modern research. Replicability is how the
community enforces itself. It’s a safeguard for the creep of subjectivity. Most of the time,
scientists know what results they want, and that can influence the results they get. The premise of
replicability is that the scientific community can correct for these flaws.

But now all sorts of well-established, multiply confirmed findings have started to look
increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined
in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but
it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine,
the phenomenon seems extremely widespread, affecting not only antipsychotics but also
therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming


                                                    6
analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold
in recent decades.

For many scientists, the effect is especially troubling because of what it exposes about the
scientific process. If replication is what separates the rigor of science from the squishiness of
pseudoscience, where do we put all these rigorously validated findings that can no longer be
proved? Which results should we believe? Francis Bacon, the early-modern philosopher and
pioneer of the scientific method, once declared that experiments were essential, because they
allowed us to “put nature to the question.” But it appears that nature often gives us different
answers.

Jonathan Schooler was a young graduate student at the University of Washington in the nineteen-
eighties when he discovered a surprising new fact about language and memory. At the time, it
was widely believed that the act of describing our memories improved them. But, in a series of
clever experiments, Schooler demonstrated that subjects shown a face and asked to describe it
were much less likely to recognize the face when shown it later than those who had simply
looked at it. Schooler called the phenomenon “verbal overshadowing.”

The study turned him into an academic star. Since its initial publication, in 1990, it has been
cited more than four hundred times. Before long, Schooler had extended the model to a variety of
other tasks, such as remembering the taste of a wine, identifying the best strawberry jam, and
solving difficult creative puzzles. In each instance, asking people to put their perceptions into
words led to dramatic decreases in performance.

But while Schooler was publishing these results in highly reputable journals, a secret worry
gnawed at him: it was proving difficult to replicate his earlier findings. “I’d often still see an
effect, but the effect just wouldn’t be as strong,” he told me. “It was as if verbal overshadowing,
my big new idea, was getting weaker.” At first, he assumed that he’d made an error in
experimental design or a statistical miscalculation. But he couldn’t find anything wrong with his
research. He then concluded that his initial batch of research subjects must have been unusually
susceptible to verbal overshadowing. (John Davis, similarly, has speculated that part of the drop-
off in the effectiveness of antipsychotics can be attributed to using subjects who suffer from
milder forms of psychosis which are less likely to show dramatic improvement.) “It wasn’t a
very satisfying explanation,” Schooler says. “One of my mentors told me that my real mistake
was trying to replicate my work. He told me doing that was just setting myself up for
disappointment.”

Schooler tried to put the problem out of his mind; his colleagues assured him that such things
happened all the time. Over the next few years, he found new research questions, got married
and had kids. But his replication problem kept on getting worse. His first attempt at replicating
the 1990 study, in 1995, resulted in an effect that was thirty per cent smaller. The next year, the
size of the effect shrank another thirty per cent. When other labs repeated Schooler’s
experiments, they got a similar spread of data, with a distinct downward trend. “This was
profoundly frustrating,” he says. “It was as if nature gave me this great result and then tried to
take it back.” In private, Schooler began referring to the problem as “cosmic habituation,” by
analogy to the decrease in response that occurs when individuals habituate to particular stimuli.

                                                  7
“Habituation is why you don’t notice the stuff that’s always there,” Schooler says. “It’s an
inevitable process of adjustment, a ratcheting down of excitement. I started joking that it was like
the cosmos was habituating to my ideas. I took it very personally.”

Schooler is now a tenured professor at the University of California at Santa Barbara. He has
curly black hair, pale-green eyes, and the relaxed demeanor of someone who lives five minutes
away from his favorite beach. When he speaks, he tends to get distracted by his own digressions.
He might begin with a point about memory, which reminds him of a favorite William James
quote, which inspires a long soliloquy on the importance of introspection. Before long, we’re
looking at pictures from Burning Man on his iPhone, which leads us back to the fragile nature of
memory.

Although verbal overshadowing remains a widely accepted theory—it’s often invoked in the
context of eyewitness testimony, for instance—Schooler is still a little peeved at the cosmos. “I
know I should just move on already,” he says. “I really should stop talking about this. But I
can’t.” That’s because he is convinced that he has stumbled on a serious problem, one that
afflicts many of the most exciting new ideas in psychology.

One of the first demonstrations of this mysterious phenomenon came in the early nineteen-
thirties. Joseph Banks Rhine, a psychologist at Duke, had developed an interest in the possibility
of extrasensory perception, or E.S.P. Rhine devised an experiment featuring Zener cards, a
special deck of twenty-five cards printed with one of five different symbols: a card was drawn
from the deck and the subject was asked to guess the symbol. Most of Rhine’s subjects guessed
about twenty per cent of the cards correctly, as you’d expect, but an undergraduate named Adam
Linzmayer averaged nearly fifty per cent during his initial sessions, and pulled off several
uncanny streaks, such as guessing nine cards in a row. The odds of this happening by chance are
about one in two million. Linzmayer did it three times.

Rhine documented these stunning results in his notebook and prepared several papers for
publication. But then, just as he began to believe in the possibility of extrasensory perception, the
student lost his spooky talent. Between 1931 and 1933, Linzmayer guessed at the identity of
another several thousand cards, but his success rate was now barely above chance. Rhine was
forced to conclude that the student’s “extra-sensory perception ability has gone through a marked
decline.” And Linzmayer wasn’t the only subject to experience such a drop-off: in nearly every
case in which Rhine and others documented E.S.P. the effect dramatically diminished over time.
Rhine called this trend the “decline effect.”

Schooler was fascinated by Rhine’s experimental struggles. Here was a scientist who had
repeatedly documented the decline of his data; he seemed to have a talent for finding results that
fell apart. In 2004, Schooler embarked on an ironic imitation of Rhine’s research: he tried to
replicate this failure to replicate. In homage to Rhine’s interests, he decided to test for a
parapsychological phenomenon known as precognition. The experiment itself was
straightforward: he flashed a set of images to a subject and asked him or her to identify each one.
Most of the time, the response was negative—the images were displayed too quickly to register.
Then Schooler randomly selected half of the images to be shown again. What he wanted to know
was whether the images that got a second showing were more likely to have been identified the

                                                 8
first time around. Could subsequent exposure have somehow influenced the initial results? Could
the effect become the cause?

The craziness of the hypothesis was the point: Schooler knows that precognition lacks a
scientific explanation. But he wasn’t testing extrasensory powers; he was testing the decline
effect. “At first, the data looked amazing, just as we’d expected,” Schooler says. “I couldn’t
believe the amount of precognition we were finding. But then, as we kept on running subjects,
the effect size”—a standard statistical measure—“kept on getting smaller and smaller.” The
scientists eventually tested more than two thousand undergraduates. “In the end, our results
looked just like Rhine’s,” Schooler said. “We found this strong paranormal effect, but it
disappeared on us.”

The most likely explanation for the decline is an obvious one: regression to the mean. As the
experiment is repeated, that is, an early statistical fluke gets cancelled out. The extrasensory
powers of Schooler’s subjects didn’t decline—they were simply an illusion that vanished over
time. And yet Schooler has noticed that many of the data sets that end up declining seem
statistically solid—that is, they contain enough data that any regression to the mean shouldn’t be
dramatic. “These are the results that pass all the tests,” he says. “The odds of them being random
are typically quite remote, like one in a million. This means that the decline effect should almost
never happen. But it happens all the time! Hell, it’s happened to me multiple times.” And this is
why Schooler believes that the decline effect deserves more attention: its ubiquity seems to
violate the laws of statistics. “Whenever I start talking about this, scientists get very nervous,” he
says. “But I still want to know what happened to my results. Like most scientists, I assumed that
it would get easier to document my effect over time. I’d get better at doing the experiments, at
zeroing in on the conditions that produce verbal overshadowing. So why did the opposite
happen? I’m convinced that we can use the tools of science to figure this out. First, though, we
have to admit that we’ve got a problem.”

In 1991, the Danish zoologist Anders Møller, at Uppsala University, in Sweden, made a
remarkable discovery about sex, barn swallows, and symmetry. It had long been known that the
asymmetrical appearance of a creature was directly linked to the amount of mutation in its
genome, so that more mutations led to more “fluctuating asymmetry.” (An easy way to measure
asymmetry in humans is to compare the length of the fingers on each hand.) What Møller
discovered is that female barn swallows were far more likely to mate with male birds that had
long, symmetrical feathers. This suggested that the picky females were using symmetry as a
proxy for the quality of male genes. Møller’s paper, which was published in Nature, set off a
frenzy of research. Here was an easily measured, widely applicable indicator of genetic quality,
and females could be shown to gravitate toward it. Aesthetics was really about genetics.

In the three years following, there were ten independent tests of the role of fluctuating
asymmetry in sexual selection, and nine of them found a relationship between symmetry and
male reproductive success. It didn’t matter if scientists were looking at the hairs on fruit flies or
replicating the swallow studies—females seemed to prefer males with mirrored halves. Before
long, the theory was applied to humans. Researchers found, for instance, that women preferred
the smell of symmetrical men, but only during the fertile phase of the menstrual cycle. Other
studies claimed that females had more orgasms when their partners were symmetrical, while a

                                                  9
paper by anthropologists at Rutgers analyzed forty Jamaican dance routines and discovered that
symmetrical men were consistently rated as better dancers.

Then the theory started to fall apart. In 1994, there were fourteen published tests of symmetry
and sexual selection, and only eight found a correlation. In 1995, there were eight papers on the
subject, and only four got a positive result. By 1998, when there were twelve additional
investigations of fluctuating asymmetry, only a third of them confirmed the theory. Worse still,
even the studies that yielded some positive result showed a steadily declining effect size.
Between 1992 and 1997, the average effect size shrank by eighty per cent.

And it’s not just fluctuating asymmetry. In 2001, Michael Jennions, a biologist at the Australian
National University, set out to analyze “temporal trends” across a wide range of subjects in
ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses
(that is, statistical syntheses of related studies), and discovered a consistent decline effect over
time, as many of the theories seemed to fade into irrelevance. In fact, even when numerous
variables were controlled for—Jennions knew, for instance, that the same author might publish
several critical papers, which could distort his analysis—there was still a significant decrease in
the validity of the hypothesis, often within a year of publication. Jennions admits that his
findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very
sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts,
the stuff that’s supposed to stand the test of time. But when you see these trends you become a
little more skeptical of things.”

What happened? Leigh Simmons, a biologist at the University of Western Australia, suggested
one explanation when he told me about his initial enthusiasm for the theory: “I was really excited
by fluctuating asymmetry. The early studies made the effect look very robust.” He decided to
conduct a few experiments of his own, investigating symmetry in male horned beetles.
“Unfortunately, I couldn’t find the effect,” he said. “But the worst part was that when I submitted
these null results I had difficulty getting them published. The journals only wanted confirming
data. It was too exciting an idea to disprove, at least back then.” For Simmons, the steep rise and
slow fall of fluctuating asymmetry is a clear example of a scientific paradigm, one of those
intellectual fads that both guide and constrain research: after a new paradigm is proposed, the
peer-review process is tilted toward positive results. But then, after a few years, the academic
incentives shift—the paradigm has become entrenched—so that the most notable results are now
those that disprove the theory.

Jennions, similarly, argues that the decline effect is largely a product of publication bias, or the
tendency of scientists and scientific journals to prefer positive data over null results, which is
what happens when no effect is found. The bias was first identified by the statistician Theodore
Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological
studies with statistically significant data found the effect they were looking for. A “significant”
result is defined as any data point that would be produced by chance less than five per cent of the
time. This ubiquitous test was invented in 1922 by the English mathematician Ronald Fisher,
who picked five per cent as the boundary line, somewhat arbitrarily, because it made pencil and
slide-rule calculations easier. Sterling saw that if ninety-seven per cent of psychology studies
were proving their hypotheses, either psychologists were extraordinarily lucky or they published

                                                 10
only the outcomes of successful experiments. In recent years, publication bias has mostly been
seen as a problem for clinical trials, since pharmaceutical companies are less interested in
publishing results that aren’t favorable. But it’s becoming increasingly clear that publication bias
also produces major distortions in fields without large corporate incentives, such as psychology
and ecology.

While publication bias almost certainly plays a role in the decline effect, it remains an
incomplete explanation. For one thing, it fails to account for the initial prevalence of positive
results among studies that never even get submitted to journals. It also fails to explain the
experience of people like Schooler, who have been unable to replicate their initial data despite
their best efforts. Richard Palmer, a biologist at the University of Alberta, who has studied the
problems surrounding fluctuating asymmetry, suspects that an equally significant issue is the
selective reporting of results—the data that scientists choose to document in the first place.
Palmer’s most convincing evidence relies on a statistical tool known as a funnel graph. When a
large number of studies have been done on a single subject, the data should follow a pattern:
studies with a large sample size should all cluster around a common value—the true result—
whereas those with a smaller sample size should exhibit a random scattering, since they’re
subject to greater sampling error. This pattern gives the graph its name, since the distribution
resembles a funnel.

The funnel graph visually captures the distortions of selective reporting. For instance, after
Palmer plotted every study of fluctuating asymmetry, he noticed that the distribution of results
with smaller sample sizes wasn’t random at all but instead skewed heavily toward positive
results. Palmer has since documented a similar problem in several other contested subject areas.
“Once I realized that selective reporting is everywhere in science, I got quite depressed,” Palmer
told me. “As a researcher, you’re always aware that there might be some nonrandom patterns,
but I had no idea how widespread it is.” In a recent review article, Palmer summarized the
impact of selective reporting on his field: “We cannot escape the troubling conclusion that
some—perhaps many—cherished generalities are at best exaggerated in their biological
significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”

Palmer emphasizes that selective reporting is not the same as scientific fraud. Rather, the
problem seems to be one of subtle omissions and unconscious misperceptions, as researchers
struggle to make sense of their results. Stephen Jay Gould referred to this as the “shoehorning”
process. “A lot of scientific measurement is really hard,” Simmons told me. “If you’re talking
about fluctuating asymmetry, then it’s a matter of minuscule differences between the right and
left sides of an animal. It’s millimetres of a tail feather. And so maybe a researcher knows that
he’s measuring a good male”—an animal that has successfully mated—“and he knows that it’s
supposed to be symmetrical. Well, that act of measurement is going to be vulnerable to all sorts
of perception biases. That’s not a cynical statement. That’s just the way human beings work.”

One of the classic examples of selective reporting concerns the testing of acupuncture in
different countries. While acupuncture is widely accepted as a medical treatment in various
Asian countries, its use is much more contested in the West. These cultural differences have
profoundly influenced the results of clinical trials. Between 1966 and 1995, there were forty-
seven studies of acupuncture in China, Taiwan, and Japan, and every single trial concluded that

                                                 11
acupuncture was an effective treatment. During the same period, there were ninety-four clinical
trials of acupuncture in the United States, Sweden, and the U.K., and only fifty-six per cent of
these studies found any therapeutic benefits. As Palmer notes, this wide discrepancy suggests
that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want
to see. Our beliefs are a form of blindness.

John Ioannidis, an epidemiologist at Stanford University, argues that such distortions are a
serious issue in biomedical research. “These exaggerations are why the decline has become so
common,” he says. “It’d be really great if the initial studies gave us an accurate summary of
things. But they don’t. And so what happens is we waste a lot of money treating millions of
patients and doing lots of follow-up studies on other themes based on results that are
misleading.” In 2005, Ioannidis published an article in the Journal of the American Medical
Association that looked at the forty-nine most cited clinical-research studies in three major
medical journals. Forty-five of these studies reported positive results, suggesting that the
intervention being tested was effective. Because most of these studies were randomized
controlled trials—the “gold standard” of medical evidence—they tended to have a significant
impact on clinical practice, and led to the spread of treatments such as hormone replacement
therapy for menopausal women and daily low-dose aspirin to prevent heart attacks and strokes.
Nevertheless, the data Ioannidis found were disturbing: of the thirty-four claims that had been
subject to replication, forty-one per cent had either been directly contradicted or had their effect
sizes significantly downgraded.

The situation is even worse when a subject is fashionable. In recent years, for instance, there
have been hundreds of studies on the various genes that control the differences in disease risk
between men and women. These findings have included everything from the mutations
responsible for the increased risk of schizophrenia to the genes underlying hypertension.
Ioannidis and his colleagues looked at four hundred and thirty-two of these claims. They quickly
discovered that the vast majority had serious flaws. But the most troubling fact emerged when he
looked at the test of replication: out of four hundred and thirty-two claims, only a single one was
consistently replicable. “This doesn’t mean that none of these claims will turn out to be true,” he
says. “But, given that most of them were done badly, I wouldn’t hold my breath.”

According to Ioannidis, the main problem is that too many researchers engage in what he calls
“significance chasing,” or finding ways to interpret the data so that it passes the statistical test of
significance—the ninety-five-per-cent boundary invented by Ronald Fisher. “The scientists are
so eager to pass this magical test that they start playing around with the numbers, trying to find
anything that seems worthy,” Ioannidis says. In recent years, Ioannidis has become increasingly
blunt about the pervasiveness of the problem. One of his most cited papers has a deliberately
provocative title: “Why Most Published Research Findings Are False.”

The problem of selective reporting is rooted in a fundamental cognitive flaw, which is that we
like proving ourselves right and hate being wrong. “It feels good to validate a hypothesis,”
Ioannidis said. “It feels even better when you’ve got a financial interest in the idea or your career
depends upon it. And that’s why, even after a claim has been systematically disproven”—he
cites, for instance, the early work on hormone replacement therapy, or claims involving various



                                                  12
vitamins—“you still see some stubborn researchers citing the first few studies that show a strong
effect. They really want to believe that it’s true.”

That’s why Schooler argues that scientists need to become more rigorous about data collection
before they publish. “We’re wasting too much time chasing after bad studies and underpowered
experiments,” he says. The current “obsession” with replicability distracts from the real problem,
which is faulty design. He notes that nobody even tries to replicate most science papers—there
are simply too many. (According to Nature, a third of all studies never even get cited, let alone
repeated.) “I’ve learned the hard way to be exceedingly careful,” Schooler says. “Every
researcher should have to spell out, in advance, how many subjects they’re going to use, and
what exactly they’re testing, and what constitutes a sufficient level of proof. We have the tools to
be much more transparent about our experiments.”

In a forthcoming paper, Schooler recommends the establishment of an open-source database, in
which researchers are required to outline their planned investigations and document all their
results. “I think this would provide a huge increase in access to scientific work and give us a
much better way to judge the quality of an experiment,” Schooler says. “It would help us finally
deal with all these issues that the decline effect is exposing.”

Although such reforms would mitigate the dangers of publication bias and selective reporting,
they still wouldn’t erase the decline effect. This is largely because scientific research will always
be shadowed by a force that can’t be curbed, only contained: sheer randomness. Although little
research has been done on the experimental dangers of chance and happenstance, the research
that exists isn’t encouraging.

In the late nineteen-nineties, John Crabbe, a neuroscientist at the Oregon Health and Science
University, conducted an experiment that showed how unknowable chance events can skew tests
of replicability. He performed a series of experiments on mouse behavior in three different
science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he
conducted the experiments, he tried to standardize every variable he could think of. The same
strains of mice were used in each lab, shipped on the same day from the same supplier. The
animals were raised in the same kind of enclosure, with the same brand of sawdust bedding.
They had been exposed to the same amount of incandescent light, were living with the same
number of littermates, and were fed the exact same type of chow pellets. When the mice were
handled, it was with the same kind of surgical glove, and when they were tested it was on the
same equipment, at the same time in the morning.

The premise of this test of replicability, of course, is that each of the labs should have generated
the same pattern of results. “If any set of experiments should have passed the test, it should have
been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe
injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on
average, six hundred centimetres more than they normally did; in Albany they moved seven
hundred and one additional centimetres. But in the Edmonton lab they moved more than five
thousand additional centimetres. Similar deviations were observed in a test of anxiety.
Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of
mouse proved most anxious, while in Albany another strain won that distinction.

                                                 13
The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are
nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting
new fact—it was a meaningless outlier, a by-product of invisible variables we don’t understand.
The problem, of course, is that such dramatic findings are also the most likely to get published in
prestigious journals, since the data are both statistically significant and entirely unexpected.
Grants get written, follow-up studies are conducted. The end result is a scientific accident that
can take years to unravel.

This suggests that the decline effect is actually a decline of illusion. While Karl Popper imagined
falsification occurring with a single, definitive experiment—Galileo refuted Aristotelian
mechanics in an afternoon—the process turns out to be much messier than that. Many scientific
theories continue to be considered true even after failing numerous experimental tests. Verbal
overshadowing might exhibit the decline effect, but it remains extensively relied upon within the
field. The same holds for any number of phenomena, from the disappearing benefits of second-
generation antipsychotics to the weak coupling ratio exhibited by decaying neutrons, which
appears to have fallen by more than ten standard deviations between 1969 and 2001. Even the
law of gravity hasn’t always been perfect at predicting real-world phenomena. (In one test,
physicists measuring gravity by means of deep boreholes in the Nevada desert found a two-and-
a-half-per-cent discrepancy between the theoretical predictions and the actual data.) Despite
these findings, second-generation antipsychotics are still widely prescribed, and our model of the
neutron hasn’t changed. The law of gravity remains the same.

Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas
generate conflicting results and suffer from falling effect sizes, they continue to get cited in the
textbooks and drive standard medical practice. Why? Because these ideas seem true. Because
they make sense. Because we can’t bear to let them go. And this is why the decline effect is so
troubling. Not because it reveals the human fallibility of science, in which data are tweaked and
beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not
because it reveals that many of our most exciting theories are fleeting fads and will soon be
rejected. (That idea has been around since Thomas Kuhn.) The decline effect is troubling
because it reminds us how difficult it is to prove anything. We like to pretend that our
experiments define the truth for us. But that’s often not the case. Just because an idea is true
doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true.
When the experiments are done, we still have to choose what to believe. ♦


Choice
http://www.radiolab.org/2008/nov/17/
Logic and emotion aren't the only forces that guide or decisions. This hour of Radiolab: what
other factors steer our choices?
We turn up the volume on the voices in our heads, and try to make sense of the babble. Forget
free will, some important decisions could come down to a steaming cup of coffee.

See: radiolab111408_Choice.mp3


                                                 14
How Much Is Too Much?
http://www.radiolab.org/2008/nov/17/how-much-is-too-much/

Turns out, Robert is more impulsive than Jad, and Jad is more analytical than Robert. Shocking, right?
Sadly for Jad, Robert's style may help him better navigate the overwhelming number of choices available
throughout modern life's expanse of options, which may also lead him to a greater sense of well-being,
according to psychologist Barry Schwartz. Jonah Lehrer helps us understand why by introducing us to
George Miller's classic paper "The Magical Number Seven, Plus or Minus Two," which explains the ability
of the average human to hold about seven pieces of discreet information in working memory at any
given time. Any more than that, and, as researcher Baba Shiv demonstrates, our good judgment can be
overwhelmed...a problem Oliver Sacks overcomes by allowing himself only limited options and a strict
routine.

Barry's book, The Paradox of Choice
Berkeley Bowl
The Magical Number Seven, Plus or Minus Two
An Article about Baba Shiv's work with fruit and cake




                                                   15

						
Related docs
Other docs by linxiaoqin