Photo courtesy of Ken Teegarden via Flickr.
Data in its many renderings — whether crunched into statistics, figures, or conclusions — can inform and support opinion, or even downright lies, as much as it can facts.
Most people make points and form assumptions based on selective, albeit data-driven statistics, which can help to prove or disprove claims — or at least we think they do.
But the truth of the matter is that statistics only produce probabilities, making conclusions provisional, and very much open to debate and change.
When numbers are used and presented misleadingly, it results in the harmful perpetuation of misinformation, or statistical falsehood.
For this reason, when journalists, scientists, and statisticians use statistics, they should be looked at critically and objectively — a difficult feat, considering the range of lenses through which viewers may interpret the information.
As data skeptics, it’s our job to call out statistical falsehood out when figures fall short of the accuracy we expect.
Damned lies and statistics: How statistical falsehoods come about.
The phrase “lies, damned lies, and statistics” is not an expression without reason.
Popularized by Mark Twain, who attributed it to the 19th-century British Prime Minister Benjamin Disraeli, the phrase (describing three types of lies) is meant to address the persuasive power of numbers.
In the book How To Lie With Statistics, author Darrell Huff breaks down how statistics are sometimes used to tell false narratives, turning data in statistical falsehood. Some of the ways this can be observed are as follows:
- Inadequate samples: Statistics derived from a sample group that is either biased as a whole, or too small to be an accurate representation
- Ignored errors: Conclusions are rarely if ever set in stone, because methods themselves are based on assumptions. Sometimes, the margin of error surrounding a specific statistic is ignored, and the finding presented as overly concrete
- Cropped charts: Graphs cropped or zoomed in upon to show turbulent spikes and drops that are, in the larger picture, less significant
- Spurious correlations: One statistic is used as support for an unrelated claim, correlations in numbers are used to imply relation between two things, or false patterns are created from random data
Statisticians, researchers, and journalists can be guilty of statistical misuse or statistical falsehoods by honest error, negligence, or even deliberate deception.[contextly_auto_sidebar id=”0lLyHZXnkepqH6LSTlk6WJZI38PyYXfv”]As TIME contributor Alex Perry pointed out in 2011, even reputable sources like the UN, the White House ,and the New York Times sometimes get it wrong.
Perry writes about, for example, the figure that India’s middle class is 300 million people — which has been used to inform American foreign policy for a decade, even though that number was 50 million in 2005, less than 250 million today.
This claim ignores that nearly 78 percent of India’s population earns less than $4,376 a year, the definition for middle class there, and that poverty, by India’s estimations, is limited to those that spend 45 to 55 cents a day, which some analysts say is a gross underestimation.
But even when statistics are presented correctly, they can be easily misconstrued. Our interpretations may be skewed, making them look like statistical falsehoods.
For example, when we are told that the average American income is $70,000, this doesn’t mean that those making less than this are in the minority. This is because ‘average’ in this case is a mathematical term found by adding up all incomes and dividing them by number of incomes.
Averages are of little use when a minority of the numbers are extraordinarily high, as is the case with incomes, which is why 67 percent of American’s fall below the average income. (The median income is a better way to find an accurate middle ground).
The takeaway: How to spot statistical falsehoods.
Numbers, just like words, can be easily warped whether by agenda or accident. As critical readers, skepticism is important if our end game is to get as close to the truth as possible. So when data is manipulated or presented improperly, it’s essential that readers carefully examine statistics and conclusions before latching onto them fully.
This is important, because people tend to trust numbers more than words, and even more so if the source is a trusted one.
Going back to Huff’s book, some helpful information is provided on how to look at statistics with a skeptical eye. Below is our interpretation of his tips, which provide good questions to challenge date-driven claims that may be statistical falsehoods.
- Who says so? Look at the source for bias, either conscious or unconscious.
- How do they know? Look at methodology to see whether the sample might be biased.
- What’s missing? Look for context and comparison to make sense of the statistic.
- Did somebody change the subject? Watch to make sure stats aren’t used as a jump to conclusions.
- Does it make sense? Think about the figure practically to gauge how much sense it actually makes. If it doesn’t seem to make sense, it probably doesn’t. Some claims really are too wild to be accurate.
Below is a checklist that synthesizes these queries. Feel free to use it in your news reading (on any site, even ours), and let us know if you can spot any misused stats! Feel free to inform us of any statistical falsehoods you are aware of @curiousmatic.