‘There are three kinds of lies: lies, damned lies, and statistics.’
We don’t know who originally came up with this. It wasn’t Benjamin Disraeli though some attribute it to him. Wasn’t Mark Twain, either, though he did popularise it.
When I was an undergraduate, we were recommended to read Darrell Huff’s How to Lie with Statistics, which I still have a copy of (indeed, I still have most of my degree textbooks). It’s worth a read, although it’s very old, written in the early ’50s and some of the language reflects that.
In my view, it’s not so much ‘lying with statistics’ as ‘misleading with statistics’ or ‘trying to justify our agenda’ with statistics. Well, OK, lying can come into those categories too.
But there’s certainly a lot of misunderstanding about statistics.
Many people believe that if you toss a coin 20 times and it comes up heads each time, that the chances of the next toss bringing up tails is higher than 50 percent because, on average, the coin must show the heads and tails evenly (assuming the coin is not weighted in some way). But the chance of tails is 50 percent exactly because there are two alternatives, heads and tails.
Here’s another misunderstanding. Average life expectancy figures can mislead.
A newborn boy was expected to live 40.2 years in 1841, and a baby girl was expected to live 42.2 years in 1841. So, some assume that means most people died in their forties. Or even that they were ‘old’ in their forties and died of old age.
Well, not really. Depends on their occupation to some extent. A manual worker is more likely to show signs of age earlier than someone in the professions or in the leisured classes. But the key factor here is life expectancy at birth. At a time when childhood diseases carried off so many babies and children, childhood was a dangerous time. In the 1840s, around 15 percent of babies died before their first birthday.
Once a human passed the danger years, they stood a good chance of making it much further than their 40s.
Here’s another. Parents may know about ‘key stages’ in the UK education system. Children are expected to reach a certain level of attainment within each key stage, this level being measured in SATs (statutory assessment tests).
When the first tests were given to seven-year-olds, roughly half scored at level 2, the supposed average, 25 percent at level 1 and 25 percent at level 3. Cue headlines in the popular press: ‘A quarter of pupils below average’. It’s not possible for most people to be above average!
Don’t forget, when talking about averages, that there is mean, median and mode.
With mean, you add up all results (test results, salaries, number of children) in your sample and divide between the number in your sample. Median is the halfway mark. Mode is the most commonly occurring score.
So if you have 10 people with an average income of £25,000, you might find most people earn that amount. Or you might find that two people earn over £1m and the rest earn quite a bit less than £25k. Those millionaires are skewing your result.
Some years ago, I read that ‘public sector salaries’ were, on average, higher than similar occupations in the private sector. Of course they were. Because in many public sectors, particularly the NHS, the lower-paid jobs have been privatised. This leaves the higher earners still working for the NHS, resulting in higher average salaries.
There once was an advertising campaign declaring that ‘eight out of 10 owners said their cat prefers…’ until people complained to the Advertising Standards Authority. The line was changed to ‘eight out of 10 owners who expressed a preference said their cat prefers….’ Better, but still pointless. Ever seen an advert for a beauty product which say ‘seven out of 10 women said…’ and in the small print it says ‘sample size, 74’?
Beware of small sample size. And sample bias too. If the respondents have to make an effort to respond to a survey, you’re not going to get an accurate result.
So what’s the point of statistics? Well, you can’t quantify everything all the time. Using a sample can be useful in finding information, but it has to be well-designed and above all, the details of the survey should show the sample size and diversity, the questions asked, the number of ‘don’t knows’ and so on. Scepticism in understanding statistics is essential.