June 25, 2015


Spurious Correlations: Correlation Does Not Equal Causation. By Tyler Vigen. Hachette Books. $20.

     It is the bane of every scientist, every researcher: the eager journalist, blogger, or other non-scientist who is so excited about that new study that proves Phases of the Moon Cause Cancer! Or Eating a Pound of Blueberries a Day Keeps You Alive for 200 Years! Or Drinking Wine Protects Your Liver!

     Or, more seriously: eating eggs will raise your cholesterol…too much salt endangers your heart…blood pressure above 140/90 is an invitation to cardiovascular disease and early death.

     The first three correlations are ludicrous, and it is hard to imagine anyone believing them. But the second, reasonable-seeming three are no more believable – and in fact, scientists have recently reversed these supposed “scientific discoveries,” saying that eggs and other foods are not major culprits in too-high cholesterol; too little salt may be a bigger danger than too much; and systolic blood pressure in the 150 range is probably just fine and does not require the lifetime medication that doctors prescribed to so many people under the 140/90 standard.

     There is no evil conspiracy behind this sort of scientific study and re-study, determination and re-determination.  And it is perfectly fine to scoff at ill-reported findings that say high calcium intake causes eye disease, heartburn causes esophageal or stomach cancer, and red wine keeps you alive longer. Virtually all reporting outside scientific journals falls victim, through ignorance and/or space limitations, to the confusion of correlation and causation.

     To put it simply: just because two things are both observed in people or in life in general, that does not mean one of them causes the other. People who take large amounts of calcium supplements are indeed more likely to have advanced macular degeneration, a serious eye disease, in later life – which could mean that they have a systemic condition or genetic predisposition to the eye disease and just happen to be calcium users; it does not mean that calcium causes the disease. Heartburn is sometimes seen in people with esophageal or stomach cancer, but most heartburn is simply a symptom of gastroesophageal reflux disease, and heartburn does not cause cancer. There is a correlation between drinking red wine in moderation and longer life in some people – but their overall lifestyle may be what leads them both to drink the wine and to have longer lives. Correlation Does Not Equal Causation, as the subtitle of Tyler Vigen’s book states.

     Indeed, as the book’s title states, it is extremely easy to find Spurious Correlations – apparently connected events or circumstances that have absolutely nothing to do with each other. Some of these are so well-known that they have passed into common parlance in their fields: the hemline indicator and Super Bowl indicator are well-known on Wall Street, for example, each of them supposedly predicting the future direction of the stock market (based, respectively, on the length of women’s dresses and whether a team from the old AFL or original NFL – or, in a variant, a team from the current AFC or NFC – wins the Super Bowl). Interestingly, traders scoff at these correlations but sometimes also weave elaborate stories to explain how they might, just might, have a grain of truth in them. These specific items are not in Vigen’s book, but he does produce a graph showing an 81.4% correlation between closing values of the New York Stock Exchange Composite Index from 2004 to 2011 and the ranking of the TV program Two and a Half Men against that of other CBS shows.

     There is not even a micro-grain of veracity in the correlations that Vigen has dug up for his book – but they are so amusing that Spurious Correlations manages both to teach a matter of genuine importance and to insist that readers laugh about it. Vigen’s approach is a wonderful one: he provides graphs that show eerily parallel patterns between entirely unrelated sets of data – graphs that seem to prove that one thing causes the other, or at the very least is intimately related to it, when in fact they prove absolutely nothing. Thus, there is a definite correlation between cheese consumption in the United States between 2000 and 2009 and the number of people who died by becoming tangled in bedsheets during the same period – the graph shows it with 94.7% correlation. And margarine consumption during the same decade is even more clearly correlated with the divorce rate in the state of Maine: a 98.9% correlation. Also – oh my – there was 96.4% correlation between E-mail spam and the use of genetically engineered soybeans between 2001 and 2010. Quick! Someone pass a law!  And let’s boost our competitiveness in information technology by making graduate school free for comic-book readers: there was 99.5% correlation between computer-science doctorates and comic-book sales between 2003 and 2009.

     Vigen’s charts are highly amusing – and they are also highly instructive. There are so many statistics available about so many things that finding correlations between unrelated events is just a matter of doing a well-directed search. Then, to show those correlations clearly enough to imply causation, be sure to choose the right scale for your graph (Y axis) and the right time period (X axis). This is exactly what Vigen does – but other people do the same thing for far less humor-inducing reasons. Politicians and issue advocates are experts at manipulating statistics to try to make people think there is causation when there is only correlation: debates about everything from illegal immigration to abortion are filled with manipulation of this sort. And even when correlation and causation are confused only through ignorance or space limitations, rather than through malice, there are serious consequences. Journalistic credibility, to the extent that that phrase still means anything, is seriously damaged by stories reporting that A leads to B when the research says only that A and B both occur under the same specific circumstances or in the same group of people. Scientific literacy, to the extent that that phrase still has meaning, is badly undermined by widely disseminated reports that lead people to believe some important causality has been discovered, when all that has really been found is an interesting correlation.

     Vigen clearly intends Spurious Correlations as a humor book, giving his graphs amusing headlines: “Save the planet! Knock down the old bridges!” “A ltr 4 u.” “Beer always makes basketball better.” “Money doesn’t grow on trees, unless that money is for bingo and those trees are houseplants.” So by all means laugh at the absurdity of the unconnected connections that he offers on page after page. There is, indeed, causation here: many of these graphs will certainly elicit amusement. But remember that this book has appeared at a time when more people than ever are ignorantly sounding off on the Internet and elsewhere about all the causes of all the terrible things happening in the world. That is only a correlation – right?

No comments:

Post a Comment