What does the future hold for voice search? If you search the web for these words – or a version of them – you’ll encounter no shortage of grand predictions.
“By 2020, 30% of web browsing sessions will be done without a screen.” Or, “By 2020, 50% of all searches will be conducted via voice.” (I’ll come back to that one in a second). Or, “2017 will be the year of voice search.” Oops, looks like we might have missed the boat on that last one.
The great thing about the future is that no-one can know exactly what’s going to happen, but you can have fun throwing out wild predictions, which most people will have forgotten about by the time we actually get there.
That’s why you get so many sweeping, ambitious, and often contradictory forecasts doing the rounds – especially with a sexy, futuristic technology like voice. It doesn’t do anyone any real harm, unless for some reason your company has decided to stake its entire marketing budget on optimising for the 50% of the populace who are predicted to be using voice search by 2020.
However, in this state of voice search series, I’ve set out to take a realistic look at voice search in 2018, beyond the hype, to determine what opportunities it really presents for marketers. But when it comes to predicting the future, things get a little murkier.
I’ve made some cautious predictions to the tune of assuming that if smart speaker ownership increases over the coming years, voice search volume will also likely increase; or that mobile voice search might be dropping away as smart speaker voice search catches on.
In this article, though, I’ll be looking at where voice search as a whole could be going: not just on mobile, or on smart speakers, but of any kind. What is the likelihood that voice search will go “mainstream” to the point that it makes up as substantial a portion of overall search volume as is predicted? What are the obstacles to that? And what does this mean for the future of voice optimisation?
Will half of all searches by 2020 really be voice searches?
I’m going to start by looking at one of the most popular predictions that is cited in relation to voice search: “By 2020, 50% of all searches will be carried out via voice.”
This statistic is popularly attributed to comScore, but as is often the case with stats, things have become a little distorted in the retelling. The original prediction behind this stat actually came from Andrew Ng, then Chief Scientist at Baidu. In an exclusive interview with Fast Company in September 2014, he stated that “In five years’ time, at least 50% of all searches are going to be either through images or speech.”
The quote was then popularised by Mary Meeker, who included it on a timeline of voice search in her Internet Trends 2016 Report, with “2020” as the year by which this prediction was slated to come true.
So, not just voice search, but voice and visual search. This makes things a little trickier to benchmark, not least because we don’t have any statistics yet on how many searches are carried out through images. (I’m assuming this would include the likes of Google Lens and Pinterest Lens, as well as Google reverse image search).
Let’s assume for the sake of argument that 35% of Ng’s predicted 50% of searches will be voice search, since voice technology is that bit more widespread and well-supported, while visual search is largely still in its infancy. How far along are we towards reaching that benchmark?
I’m going to be generous here and count voice queries of every kind in my calculations, even though as I indicated in Part 1, only around 20% of these searches can actually be ranked for. Around 60% of Google searches are carried out on mobile (per Hitwise), so if we use Google’s most recent stat that 1 in every 5 mobile searches is carried out via voice, that means about 12% of all Google searches (420 million searches) are mobile voice queries.
In Part 2 I estimated that another 26.4 million queries are carried out via smart speakers, which is an additional 0.75% – so in total that makes 12.75% of searches, or if we’re rounding up, 13% of Google searches that are voice queries.
This means that the amount of voice queries on Google would need to increase by another 22 percentage points over the next year and a half for Ng’s prediction to come true. To reach 50% – the stat most often cited by voice enthusiasts as to why voice is so crucial to optimise for – we would need to find an additional 1.3 billion voice searches per day from somewhere.
That’s nearly ten times the number of smart speakers predicted to ship to the US over the next three years. Even if you believe that smart speakers will single-handedly bring voice search into the mainstream, it’s a tall order.
So okay, we’ve established that voice enthusiasts might need to cool their jets a bit when it comes to the adoption of voice search. But if we return to (our interpretation of) Andrew Ng’s prediction that 35% of searches by 2020 will be voice, what is going to make the volume of voice search leap up those remaining 22 percentage points in less than two years?
Is it sheer volume of voice device ownership? Is it the increasing normalisation of speaking aloud to a device in public? Or is it something else?
Ng made another prediction, via Twitter this time, in December 2016 which gives us a clue as to his thinking in this regard. He wrote, “As speech-recognition accuracy goes from 95% to 99%, we’ll go from barely using it to using all the time!”
As speech-recognition accuracy goes from 95% to 99%, we'll go from barely using it to using all the time! https://t.co/TfjqJLDTPJ
— Andrew Ng (@AndrewYNg) December 16, 2016
So, Andrew Ng believes that sheer accuracy of recognition is what will take voice search into the mainstream. 95% word recognition is actually the same threshold of accuracy as human speech (Google officially reached this threshold last year, to great excitement), so Ng is holding machines to a higher standard than humans – which is fair enough, since we tend to approach new technology and machine interfaces with a higher degree of scepticism, and are less forgiving of errors. In order to win us over, they have to really wow us.
But is pure vocal recognition the only barrier to voice search going mainstream? Let’s consider the user experience of voice search.
The UX problems with voice
As I mentioned in our last instalment on natural language and conversational search, when using voice interfaces, we tend to hold the same expectations that we have for a conversation with a human being.
We expect machines to respond in a human way, seamlessly and intuitively carrying on the exchange; when they don’t, bringing us up short with an “I’m sorry, I don’t understand the question,” we’re thrown off and turned off.
This explains why voice recognition is weighted so highly as a measure of success for voice interfaces, but it’s not the only important factor. Often, understanding you still isn’t enough to produce the right response; many voice commands depend on specific phrasing to activate, meaning that you can still be brought up short if you don’t know exactly what to utter to achieve the result you want.
— Rab (@Zen_Rab) December 3, 2017
— Lindsay McGregor (@LindsMcGregor) July 5, 2017
hearing my roommate try and fail repeatedly to be recognized by the Google home is the slapstick comedy I needed tonight.
— BooQuixote42 (@DonQuixote_42) December 18, 2017
The internet is full of examples of what happens when our voice assistants don’t quite understand the question.
Or what about if you misspeak – the verbal equivalent of a typo? When typing, you can just delete and retype your query before you submit, but when speaking, there’s no way to take back the last word or phrase you uttered. Instead you have to wait for the device to respond, give you an error, and then start again.
If this happens multiple times, it can prompt the user to give up in exasperation. Writing for Gizmodo, Chris Thomson paints a vivid picture of the frustration experienced by users with speech impediments when trying to use voice-activated smart speakers.
One of the major reasons that voice interfaces are heralded as the future of technology is because speaking your query or command aloud is supposed to be so much faster and more frictionless than typing it. At the moment, though, that’s far from being the case.
However, while they might be preventing the uptake of voice interfaces (which is intrinsically linked to the adoption of voice search) at the moment, these are all issues that could reasonably be solved in the future as the technology advances. None of them are deal-breakers.
For me, the real deal-breaker when it comes to voice search, and the reason why I believe it will never see widespread adoption in its present state, is this: it doesn’t do what it’s supposed to.
One result to rule them all?
Think back for a moment to what web search is designed to do. Though we take it for granted nowadays, before search engines came along, there was no systematic way to find webpages and navigate the world wide web. You had to know the web address of a site already in order to visit it, and the early “weblogs” (blogs) often contained lists of interesting sites that web users had found on their travels.
Web search changed all that by doing the hard work for users – pulling in information about what websites were out there, and presenting it to users so that they could navigate the web more easily. This last part is the issue that I’m getting at, in a sidelong sort of way: so that they could navigate the web.
Contrast that with what voice search currently does: it responds to a query from the user with a single, definitive result. It might be possible to follow up that query with subsequent searches, or to carry out an action (e.g. ordering pizza, hearing a recipe, receiving directions), but otherwise, the voice journey stops there. You can’t browse the web using your Amazon Echo. You can using your smartphone, but for all intents and purposes, that’s just mobile search. Nothing about that experience is unique to voice search.
This is the reason why voice search is only ever used for general knowledge queries or retrieving specific pieces of information: it’s inherently hampered by an inability to explore the web.
It’s why voice search in its present state is mostly a novelty: not just because voice devices themselves are a novelty, but because it’s difficult to really search with it.
One result to rule them all?
Even when voice devices like smart speakers catch on and become part of people’s daily lives, it’s because of their other capabilities, not because of search. Search is always incidental.
This is also why Google, Amazon and other makers of smart speakers are more interested in expanding the commands that their devices respond to and the places they can respond to them. For them, that is the future of voice.
What does this mean for voice search?
What true voice search could sound like
I see two possible future scenarios for voice search.
One, voice search remains as a “single search result” tool which is mostly useful for fact-finding exercises and questions that have a definitive answer, in which case there will always be a limit to how big voice search can get, and voice will only ever be a minor channel in the grand scheme of search and SEO. Marketers should recognise the role that it plays in their overall search strategy (if any), think about the use cases realistically, and optimise for those – or not – if it makes sense to.
Or two, voice search develops into a genuine tool for searching the web. This might involve a user being initially read the top result for their search, and then being presented with the option to hear more search results – perhaps three or four, to keep things concise.
If they then want to hear content from one of the results, they can instruct the voice assistant to navigate to that webpage, and then proceed to listen to an audio version of the news article, blog post, Wikipedia page, or other website that they’ve chosen.
Duane Forrester, VP Insights at Yext, envisages just such an eventuality during a wide-ranging video discussion on the future of voice search with Stone Temple Consulting’s Eric Enge and PeakActivity’s Brent Csutoras. The whole discussion is excellent and well, well worth a watch or a read (the transcript is available beneath the video).
Duane Forrester: We may see a resurgence in [long-form content] a couple of years from now if our voice assistants are now reading these things out loud.”
Brent Csutoras: Sure. Like an audible.
Duane: Exactly, like a built-in native audible, like “I’m on this page, do you want me to read it? “Yes, read it out loud to me.” There we go.
Brent: Yes because in that sense, I’m going to want to hear more. I’m driving down the street and want to hear about what’s happening and I want to hear follow up pieces.
Duane: It immediately converts every single website, every page of content, every blog, it immediately converts all of those into on-demand podcasts. That’s a cool idea, it’s a cool adaptation. I’m not sure if we’ll get there. We will when we get to the point of having a digital agent. But that’s still years in the future.
At first, I was sceptical of the idea that people would ever want to consume web content primarily via audio. Surely it would be slower and less convenient than visually scanning the same information?
Then I thought about the fast-growing popularity of podcasts and audiobooks, and realised that the audio web could fit into our lives in many of the same ways that other types of audio have – especially if voice devices become as omnipresent as many tech and marketing pundits are predicting they will.
Is this a distant future? Perhaps. But this is how I imagine voice search truly entering the mainstream, the same way that web search did: as a means of exploring the web.
The future of voice search might not be Google
What surprises me is that for all the hype surrounding voice search and its possibilities, hardly anyone has pointed out the obvious drawback of the single search result or considered what it could mean for voice adoption.
An article by Marieke van de Rakt of Yoast highlights it as an obstacle, but believes that screen connectivity is the answer. This is a possibility, especially as Google and Amazon are now equipping their smart speakers with screens – but I think that requiring a screen removes some of the convenience of voice as a user interface, one that can be interacted with while doing other things (like driving) without pulling the user’s attention away.
For the most part, however, it seems to me that marketers and SEOs have been too content to just follow Google’s lead (and Bing’s, because realistically, where Google goes, Bing will follow) when it comes to things like voice search. Google is presenting the user with a single search result? Everyone optimise for single search results; the future of search will be one answer!
Why? What about that makes for a good user experience? Is this what search was meant to do?
I understand letting Google set the agenda when it comes to SEO more broadly, because realistically it’s so dominant that any SEO strategy has to mainly cater to Google. However, I don’t think we should assume that Google will remain the leader of search in every new, emerging area like voice or visual search.
Oh, Google is doing its best to stay on top, and there’s no denying that it’s taken an early lead; its speech recognition and conversational search capabilities are currently second to none. But Google isn’t the hot young start-up that it was when it came along and challenged the web search status quo. It’s much bigger now, and has investors to answer to.
Google makes a huge amount of revenue from its search and advertising empire; its primary interest is in maintaining that. One search result suits Google just fine, if it means that users won’t leave its walled garden.
Marketers and SEOs should remember that Google wasn’t always the king of web search; other web search engines entered the game first, and were very popular – but Google changed the game because the way it had of doing search was so much better, and users loved it. Eventually, the other search engines couldn’t compete.
The same thing could easily happen with voice search.
The logos of some of the early search engines that Google out-competed in its quest for web search dominance.
The future of voice optimisation
So where does that leave the future of voice optimisation?
Many of these eventualities seem like far-off possibilities at best, and there’s no way of being certain how they will pan out. How should marketers go about optimising for voice now and in the near future?
Though I’ve taken a fairly sceptical stance throughout this series, I do believe that voice is worth optimising for. However, the opportunity around voice search specifically is limited, and so I believe that brands should consider all the options for being present on voice as a whole – whether that’s on mobile, as a mobile voice search result, or on smart speakers, as an Alexa Skill or Google Home Action – and pursue whatever strategy makes most sense for their brand.
I’m interested in seeing us move away from thinking about voice and voice devices as a search channel, and more as a general marketing channel that it’s possible to be present on in various different ways – like social media.
It’s still extremely early days for this technology, and while the potential is huge, there are still many things we don’t know about what the future of voice will look like, so it’s important not to jump the gun.
Brent Csutoras sums things up extremely well in the future of voice search discussion:
“This is an important technology I really think you should pay attention to. What I worry about is that people start feeling like they have to be involved, right? It’s like, “Oh crap, I don’t want to be left behind.”
“What I would say is that in this space, it’s like the example of Instagram. Everybody wanted to have an Instagram account and they had nothing visual to show, so they just started creating crap to show it. If you have something that fits for voice search right now, then you should absolutely take the steps that you can to participate with it. If you don’t, then definitely just pay attention to it.
“This space is going to open up, it is going to provide an opportunity for just about everyone, so stay abreast of what’s happening in this space, what’s the technology, and start envisioning your company in that space, and then wait until you have that opportunity to make that a reality. But don’t overstress yourself and feel like you’re failing because you’re not in the space right now.”
Read the previous parts of this series: