Have you ever noticed in the marketing industry that seemingly impressive statistics, often showcasing a startling new trend, can sometimes have unclear origins?

They usually sound grand and eye-opening, the kind of stat that makes you stop in your tracks and say, ‘Wow! Really?’ But, on looking deeper, you may find that these punchy figures have been cited in ways that don’t faithfully reproduce the original stat or its source.

I’ve made it something of a pastime to debunk stats in marketing services that have suffered this fate. You probably know the stats I’m talking about – predictions that 50% of searches by 2020 will be conducted via voice, or assertions that 40% of Generation Z are shunning Google for TikTok.

These stats sound like big news, possibly too big to be true – but they’re so prevalent and widely-cited (often with a reputable research body attached) that they must be describing something concrete. Presentations are delivered around them; people discuss the dynamics that have led to these shifts taking place. But frequently, the stat that gave rise to these discussions isn’t quite what it seems.

Stats like these are by no means fabricated, but they aren’t totally faithfully reflecting the original findings of the study, or research, or prediction, but rather a version that seems bigger and more definitive.

Let me explain where this tendency comes from, why I think it’s a problem, and what can be done to combat its prevalence, especially in the world of marketing.

An industry’s love affair with statistics

Some of these offending stats that are cited over and over undergo a form of generation loss, where progressively copying copies causes an image or piece of data to degrade in quality (like a game of telephone).

To give an example of this in action, let’s look at a stat I wrote about recently: the claim that 40% of generation Z are using TikTok for search rather than Google. You may well have seen this stat brought up in discussions of TikTok’s dominance and how young people’s online behaviour is changing.

It originated from a presentation quote by Google’s Senior VP, Prabhakar Raghavan, while speaking at Fortune’s Brainstorm Tech conference. However, the original quote contained several extra details that considerably change the emphasis of the stat:

  • Prabhakar was talking about use of Tiktok and Instagram
  • He mentioned that “something like almost 40%” (important caveats) of young people use these social networks for certain types of information searching
  • He was talking specifically about looking for restaurant inspiration, not broader types of search.

Follow-ups by TechCrunch also revealed that the statistic referred only to young people aged 18-24 in the US; moreover, it came from internal research that Google has, so far, not made public.

Many publications reported Raghavan’s words with the relevant context, but some chose in headlines or articles to focus on punchier, more interesting aspects of the news: like the fact that young people were ‘using TikTok for search’, without mentioning that this was not limited to TikTok, and was specific to restaurant discovery.

This led to a simplified version of the stat becoming more prevalent than the original; few would think to question whether it was totally accurate, particularly when the source (Google) was clearly given. Curious onlookers (or industrious B2B marketing journalists, ahem) could easily search for more information – but they might encounter one of the many articles talking about the distorted version of the stat, rather than quoting the original verbatim.

Statistics, statistics everywhere

It’s not hard to find examples of this across the marketing services industry – in presentations, content marketing, or social media posts – and sometimes, business or tech journalism more broadly. We love a good stat; bringing in a good stat is a succinct and eye-catching way to illustrate, or support, an argument, or point to evidence of a broader industry trend.

But not everyone has the time to sit at their computer and burrow down a rabbit hole until they pinpoint exactly where the claim of “50% of searches will be conducted via voice in 2020” originated from. Often, naming the source seems like enough for a citation (especially in a presentation – people can go away and look up original themselves if they want to). Blog posts and content marketing (and pitches no doubt) may link to or reference a publication that cites the source, rather than the source itself. (In some rare cases, they may mistakenly attribute the stat to the publication reporting on it).

Without the ability to easily check or find the original source, context is lost, along with details like date, region (a lot of stats are specific to the United States, but cited as if they are global), and demographic. These things will often make a stat a lot less broadly applicable than it seems, and sometimes totally out of date.

Also, sometimes the research and tech bodies providing stats can be less rigorous than they should be. Another one of my ‘stat bugbears’ is a figure from Google Managing Director Sundar Pichai’s 2016 keynote at Google I/O in which he said that one in five search queries on mobile was conducted via voice – but only in the US on Android and through the Google app.

This stat frequently circulated without the attached qualifiers, but also if you watch Pichai’s accompanying presentation closely, you’ll notice that Google is grouping voice commands like “directions home” and “call Grandma” into the voice ‘search’ bucket. Not quite all it seems.

Why does this stuff matter?

You might be asking: what’s the harm in one or two slightly exaggerated presentation stats? Even if these figures aren’t exactly representative of the original findings, they’re still true to broader trends, like the fact that TikTok is beginning to take on a ‘search’ role for young people.

The problem is that once you start to keep an eye out for these distorted or reframed stats, you’ll realise there are quite a lot of them about. And as I mentioned, the more they circulate, the more authoritative they seem.

And when a statistic appears to say that two-fifths of young people are using TikTok over Google, businesses will think they must be on TikTok – even when the reality is something quite different. How many businesses scrambled to optimise for voice because they heard that half of all searches would be voice just in a few years’ time?

In the marketing services industry, research, whitepapers and studies are also more abundant than ever – but as Andrew Tenzer wrote for Marketing Week in an excellent 2022 piece, not all of this research is high-quality, and some is conducted to support a particular agenda. Tenzer gives three tips for how to spot poor research, and mentions the techniques that are used to generate attractive, eye-catching stats – these are also important to watch out for.

Below are some more tips of my own to help with spotting, and avoiding, distorted stats.

Some tips when backing up your argument with secondary sources and statistics

1) If you’re citing a statistic, make sure you find the original source

Yes, this requires some digging and will take longer, particularly if the Nice Stat you have seen doesn’t itself link to the original. However, Googling the exact figure or percentage that you’re looking for, along with the source, will usually turn up the right statistic without much effort.

If it doesn’t – or if there isn’t a clear source attributed – this could be a sign that you’re looking at a distorted stat. If the source cited is a publication (and not one that publishes original research or whitepapers), try to find the specific article that gave rise to the stat. Was this publication actually citing someone else? Can you find the stat in question?

When you do track down the original, make sure you take note of the publishing date and whether the stat is as fresh as you had been led to believe. Is this finding likely to still be representative of the current tech or marketing landscape?

Also, pay attention to the context of the original stat – does it reflect what you thought the stat showed? Make sure you’re clear on the demographic that’s being talked about (if it’s not evident in the text of a study, it might be in the appendix) and the region (stats that are presented as global may often be specific to one region, like the United States, or a few select regions).

2) Ask yourself whether a stat reflects your own observations

I’m of the opinion that the ‘50% by 2020’ voice stat would never have picked up so much steam if people had stopped to consider whether this fitted with what they knew of voice search in their daily lives.

Were they using voice search regularly? Were people they knew using it regularly? Did their experiences with voice search make it seem like a good substitute for text-based web search? If not, where did they think this sudden and massive shift was going to come from?

This doesn’t work for every stat, especially statistics about groups that the marketer in question might not be a part of; if you’re not a young person, who knows how exactly they might be using TikTok? Sometimes, people will also set out to confirm a stat without questioning the underlying figure – ‘I’ve heard that 40% of young people are using TikTok for search, and so I’m going to ask them why they do that’.

As a result, the responses that they get assure them that it must be true.

Nevertheless, this thought process can puncture some of the more ambitious stats about emerging technology like VR or the metaverse. If no-one you know is using it, and no-one you know wants to use it, how likely is it that [insert-large-percentage-here] of people will be doing so in just a few years’ time?

3) Cite your stats responsibly

I know how tempting it is, once you’ve found a stat that fits perfectly into your argument like a missing puzzle piece, to simply slot it in there and ask no further questions. Or to craft a thinkpiece or presentation around it.

However, is it better to have an argument that rests entirely on a stat that’s slightly skewed – or to be able to make it without relying on skewed stats?

After you’ve done the above due diligence of tracking down the original source and subjecting the stat to the ‘sniff test’ of common sense, make sure you’re citing the stat faithfully by mentioning the context, year of origin, region (if applicable), demographic (if applicable), and any other relevant details.

If it makes the stat sounds less interesting in the process – perhaps it’s time to find a more impactful stat.

In conclusion, I’d like to refer back to a point made by Tenzer as to why avoiding shoddy research is important:

“Marketers can’t absolve themselves of the role they’re playing in the normalisation of poor research. We should all be very concerned by the lack of critical thinking. If we continue to take everything at face value, companies will continue to churn out nonsense and the quality of decision making will spiral downwards.”

Members can download our B2B Content Marketing Best Practice Guide.

Econsultancy offers a short course in data analysis and storytelling, as well as bespoke marketing academies.