Here are nine problems with voice that one could argue are yet to be fully resolved. 

Discretion

Speaking into a device cannot be done with any discretion. This is one of the reasons some people predicted voice search would never take off.

Wouldn’t users be far too bashful to be heard asking their device to order a large pepperoni pizza with a bottle of Coca-Cola and a Viennetta?

More pertinently there are certain data that some people may be understandably uncomfortable in broadcasting.

Of course, in the home, this argument will be more of a moot point. And voice is not intended to replace all other input methods.

Nevertheless, there are many scenarios where voice input simply isn’t appropriate.

la discretion

Privacy

Alexa is actively listening to its adopted household 24 hours a day.

Rory Carroll explores this idea entertainingly for The Guardian. The jist is that we have to decide whether we’re happy to further close the gap between our homes and the outside world.

Audio recordings of Alexa requests (and the moments before and after) are stored on Amazon’s servers, alongside Alexa’s own transcription of the request.

The Alexa app does allow users to delete this data, but doing so is not recommended as the service should improve its accuracy with increased context.

However, it’s fairly obvious that there are privacy questions raised here. Aside from the fear of people listening in or hacking our interaction history, there is concern that companies will have access to too large a portion of our lives.

In an article for The Atlantic, Kaveh Waddell argues that only when artificial intelligence can be embedded in devices (with speech analysis occuring locally) can it ‘truly become an unobtrusive and discreet helper’.

Choice

Interacting with a digital assistant solely via voice, without a screen to provide output, has perhaps greatest implications for choice.

Though the Alexa app will allow users to change default settings (for example, for news providers accessed when you ask Alexa about the news), voice output presents no choice for the user.

When it comes to commerce, according to Amazon, Alexa will source requested purchases from:

  • Your order history – (only Prime-eligible items)
  • Amazon’s Choice – (Amazon’s Choice items are highly rated, well-priced products with Prime shipping)
  • Prime-eligible items – (including delivery by Prime Now for eligible items) 

In the UK, where purchasing isn’t enabled, there is more choice intrinsic, because users must add a product to a shopping list via Echo then use their smartphone app to select a product and complete purchase.

Alexa (and any personal assistant using a voice user interface) is nevertheless in a unique position for dictating product choice, media choice etc.

Of course, this ethical problem exists within the ‘traditional’ Google search results, but delivering one result instead of 10 brings the issue into much greater focus.

Thomas Hobson (of choice fame)

thomas hobson

Multiple users

Imagine I own a Google Home speaker and all my family are asking it questions. At the moment, the device only supports a single Google account.

So, my family’s media choices and shopping habits will be logged as indicative of my behaviour, and will shape future recommendations (not to mention use my payment details, access my email account etc.).

There are implications for the accuracy of future recommendations and contextual understanding (e.g. of my whereabouts), as well as the potential for misuse (by a cheeky guest).

Essentially, the device should be personal but plainly isn’t.

Google is working on a solution to allow multiple accounts, but surely problems will still arise unless these devices learn to differentiate between voices (or users regularly deactivate or lock their devices). 

Sophistication

Though Alexa has plenty of great reviews, it’s clear that speech analysis is nowhere near robust enough to prevent annoying UX failures. Namely, asking five times before giving up and using a graphical user interface.

This is perhaps best demonstrated by Satya Nadella’s use of Cortana at 2015’s DreamForce event. As described by Yahoo! Tech, Nadella ‘began by asking Cortana, “Show me my most at-risk opportunities. Cortana hilariously interpreted it as, “Show me to buy milk at this opportunity.”’

On Nadella’s second attempt at the command, Cortana erroenously created a reminder of some sort.

Of course, the tech will continue to improve, and currently works best when limited to a number of common commands (music, shopping list etc.), but for someone with a terribly flat telephone voice like myself, misunderstanding is something I have to consider.

Commercial oversight and branding

It’s not just issues on the consumer side to consider. Google doesn’t provide voice search data in the Google Search Console, so companies have little idea of customer usage.

Additionally, brands currently have little control on how information from their web pages is presented by a voice user interface.

Writing for Search Engine Land, Joe Youngblood describes how he ‘asked Google a question about an NFL player’s stats. The answer Google gave [him] was originally written by Rotowire and published by ESPN. When it cited the source, Google called the website “Esss-Pen.com” instead of “E-S-P-N.”’

Granted, that example was from a couple of years ago, but it demonstrates the problem at hand. 

Advertising

There are two angles to take here.

In the short term, will voice results cost certain publishers advertising revenue? Sites that package information which is then presented by Alexa, for example, will not be able to display ads to a voice user (or solicit data, subscriptions etc.).

The other angle is that advertising is inherently difficult to offer via a voice user interface, because promoted products and services again prompt a debate about ethics.  

Polyglots

This is probably a subset of the ‘sophistication’ argument above.

English users of Siri cannot currently ask for directions to places with Japanese names, as Japanese map data will not be accessible (as Sam Byford of The Verge explains). 

Education

Perhaps the biggest task is educating people on how voice user interfaces work.

Though the dream of many a UX designer (and voice pioneers specifically) is an interface that requires no learning, an intuitive system, users currently have to get to grips with set up and a sister app.

There is also the task of educating users as to how various services integrate, what exactly is possible with devices such as Alexa, and how information is selected. 

But….

For all the downsides of voice, we shouldn’t forget that it isn’t meant to replace all other devices.

Amazon packages Echo with examples of basic queries, and these assistants are very much designed to intially manage a connected home, a calendar, and a shopping list.

Greg Hart, Amazon’s vice president in charge of Echo and Alexa tells Slate that building a voice assistant that can respond to every possible query is “a really hard problem.”

“People can get really turned off if they have an experience that’s subpar or frustrating.”

However, use cases for a hands-free, eyeball-free interface, are anything but scarce. So, voice is likely here to stay, but there’s much to be figured out.