Spurious Correlations:
Correlation Does Not Equal Causation. By Tyler Vigen. Hachette Books. $20.
It is the bane of every
scientist, every researcher: the eager journalist, blogger, or other
non-scientist who is so excited about
that new study that proves Phases of the
Moon Cause Cancer! Or Eating a Pound
of Blueberries a Day Keeps You Alive for 200 Years! Or Drinking Wine Protects Your Liver!
Or, more seriously: eating
eggs will raise your cholesterol…too much salt endangers your heart…blood
pressure above 140/90 is an invitation to cardiovascular disease and early
death.
The first three correlations
are ludicrous, and it is hard to imagine anyone believing them. But the second,
reasonable-seeming three are no more believable – and in fact, scientists have
recently reversed these supposed “scientific discoveries,” saying that eggs and
other foods are not major culprits in too-high cholesterol; too little salt may be a bigger danger than
too much; and systolic blood pressure in the 150 range is probably just fine
and does not require the lifetime medication that doctors prescribed to so many
people under the 140/90 standard.
There is no evil conspiracy
behind this sort of scientific study and re-study, determination and
re-determination. And it is perfectly
fine to scoff at ill-reported findings that say high calcium intake causes eye
disease, heartburn causes esophageal or stomach cancer, and red wine keeps you
alive longer. Virtually all reporting outside scientific journals falls victim,
through ignorance and/or space limitations, to the confusion of correlation and
causation.
To put it simply: just because two things
are both observed in people or in life in general, that does not mean one of
them causes the other. People who take large amounts of calcium supplements are
indeed more likely to have advanced macular degeneration, a serious eye
disease, in later life – which could mean that they have a systemic condition
or genetic predisposition to the eye disease and just happen to be calcium
users; it does not mean that calcium causes the disease. Heartburn is sometimes
seen in people with esophageal or stomach cancer, but most heartburn is simply
a symptom of gastroesophageal reflux disease, and heartburn does not cause
cancer. There is a correlation between drinking red wine in moderation and
longer life in some people – but their overall lifestyle may be what leads them
both to drink the wine and to have longer lives. Correlation Does Not Equal Causation, as the subtitle of Tyler
Vigen’s book states.
Indeed, as the book’s title states, it is extremely easy to
find Spurious Correlations –
apparently connected events or circumstances that have absolutely nothing to do
with each other. Some of these are so well-known that they have passed into
common parlance in their fields: the hemline indicator and Super Bowl indicator
are well-known on Wall Street, for example, each of them supposedly predicting
the future direction of the stock market (based, respectively, on the length of
women’s dresses and whether a team from the old AFL or original NFL – or, in a
variant, a team from the current AFC or NFC – wins the Super Bowl). Interestingly,
traders scoff at these correlations but sometimes also weave elaborate stories
to explain how they might, just might, have a grain of truth in them. These
specific items are not in Vigen’s book, but he does produce a graph showing an
81.4% correlation between closing values of the New York Stock Exchange
Composite Index from 2004 to 2011 and the ranking of the TV program Two and a Half Men against that of other
CBS shows.
There is not even a micro-grain
of veracity in the correlations that Vigen has dug up for his book – but they
are so amusing that Spurious Correlations
manages both to teach a matter of genuine importance and to insist that readers
laugh about it. Vigen’s approach is a wonderful one: he provides graphs that
show eerily parallel patterns between entirely unrelated sets of data – graphs
that seem to prove that one thing causes the other, or at the very least is
intimately related to it, when in fact they prove absolutely nothing. Thus,
there is a definite correlation between cheese consumption in the United States
between 2000 and 2009 and the number of people who died by becoming tangled in
bedsheets during the same period – the graph shows it with 94.7% correlation.
And margarine consumption during the same decade is even more clearly
correlated with the divorce rate in the state of Maine: a 98.9% correlation. Also
– oh my – there was 96.4% correlation between E-mail spam and the use of
genetically engineered soybeans between 2001 and 2010. Quick! Someone pass a
law! And let’s boost our competitiveness
in information technology by making graduate school free for comic-book
readers: there was 99.5% correlation between computer-science doctorates and
comic-book sales between 2003 and 2009.
Vigen’s charts are highly
amusing – and they are also highly instructive. There are so many statistics
available about so many things that finding correlations between unrelated
events is just a matter of doing a well-directed search. Then, to show those
correlations clearly enough to imply causation, be sure to choose the right
scale for your graph (Y axis) and the right time period (X axis). This is
exactly what Vigen does – but other people do the same thing for far less
humor-inducing reasons. Politicians and issue advocates are experts at
manipulating statistics to try to make people think there is causation when
there is only correlation: debates about everything from illegal immigration to
abortion are filled with manipulation of this sort. And even when correlation
and causation are confused only through ignorance or space limitations, rather
than through malice, there are serious consequences. Journalistic credibility,
to the extent that that phrase still means anything, is seriously damaged by
stories reporting that A leads to B when the research says only that A and B
both occur under the same specific circumstances or in the same group of
people. Scientific literacy, to the extent that that phrase still has meaning, is badly undermined by widely
disseminated reports that lead people to believe some important causality has
been discovered, when all that has really been found is an interesting
correlation.
Vigen clearly intends Spurious Correlations as a humor book,
giving his graphs amusing headlines: “Save the planet! Knock down the old
bridges!” “A ltr 4 u.” “Beer always makes basketball better.” “Money doesn’t
grow on trees, unless that money is for bingo and those trees are houseplants.”
So by all means laugh at the absurdity of the unconnected connections that he
offers on page after page. There is, indeed, causation here: many of these
graphs will certainly elicit amusement. But remember that this book has
appeared at a time when more people than ever are ignorantly sounding off on
the Internet and elsewhere about all the causes of all the terrible things
happening in the world. That is only a correlation – right?
No comments:
Post a Comment