– Generative AI technology is already being used in customer service chatbots in a bid to improve the technology’s conversational fluency when interacting with customers.
– Intercom and Yext, two companies testing this technology, are employing measures such as resolution rate benchmarks, using a brand voice and dataset, and of course, extensive testing of the technology with a select number of clients prior to launch.
– Both companies envisage a future in which humans collaborate with AI in order to produce more effective customer service – but stress the need for care and caution as the technology is deployed.
Since generative AI chat tool ChatGPT launched in late November of last year – and particularly since it was made widely available to developers via an API last March – many businesses have taken to integrating the technology into their products and services.
One standout use case is for chatbots: ChatGPT demonstrated impressive chat capabilities from its release including the ability to respond to complex, multi-part requests, and retain context from much earlier in a conversation, resulting in a more intuitive and natural exchange than chat technology has previously been capable of.
Despite the potential offered by chatbots to take the weight off human customer support agents and increase brands’ self-serve capabilities, the fact is that many existing chatbots are limited in their functionality, making them frustrating to interact with. “Within three engagements, you can probably tell that you’re talking to a robot that gives you canned … answers,” says Nico Beukes, Managing Director at Yext. “My patience with [traditional] chatbots is very low; I would be the first person to stop engaging on that channel and try and call or find a different way to engage with that brand.”
Yext is one of the companies that is looking to change this by using generative AI to create a conversational tool, Yext Chat, that can fluently answer questions about businesses. Business messaging provider Intercom, which was an early adopter of generative AI into its customer support toolset, has also been trialling a generative AI chatbot, Fin, that uses GPT-4 – the successor to ChatGPT.
How are the two companies testing the effectiveness of these tools, and in particular, what guardrails have they put in place to mitigate the well-known issues that generative AI has with ‘hallucinations’? I spoke to Yext’s Beukes along with Fergal Reid, Senior Director of Machine Learning at Intercom, to learn more.
How to safeguard generative AI chatbots
Large language models (or LLMs) like GPT-3.5, which powers ChatGPT, and GPT-4 have an advanced ability to generate fluent, even human-sounding prose. However, it’s a mistake to regard these models as being able to reason, or understand what they are responding to; instead, generative AI technology is extensively trained to produce the most correct-sounding sentences.
This makes it excellent at outputting things like blog posts, marketing copy, and even poetry and song lyrics on demand, but unpredictable when it comes to facts, often confidently stating incorrect information (a phenomenon known as ‘hallucination’). For a static piece of content that can be edited and fact-checked before publication, this isn’t a huge issue. But what about a chatbot that interacts with customers in real-time?
Intercom was quick to integrate GPT-3.5 into its customer service toolset, recognising the potential of the technology for helping customer service teams, but initially held back from including it in anything customers might interact with directly, instead restricting it to behind-the-scenes use.
“At that point, we felt the technology would be best suited for “human in the loop” features, due to the widely-cited issues with hallucinations and made-up answers,” says Fergal Reid. “We quickly built and shipped many prototypes internally, ensuring we learned and iterated with each round of user feedback.”
That was in January; but by March, Intercom’s approach to generative AI had already changed. “We got an early peek into GPT-4 and were immediately impressed with the increased safeguards against hallucinations and more advanced natural language capabilities,” Reid recalls. “We felt that the technology had crossed the threshold where it could be used in front of customers.”
The ethics of AI writing and the need for a ‘human in the loop’
Resolution rates and benchmarks
This led to the creation of Fin, which is built on GPT-4, and which was subsequently launched in private beta with a select number of Intercom’s clients for further testing.
“From the start, we wanted to build a bot that could converse naturally, answer questions using only information sourced from a company’s help pages, reduce hallucinations, and require minimal setup,” Reid says of building Fin.
Intercom has prior experience with building customer support bots, such as Resolution Bot, a chatbot that is billed as being able to resolve “33% of common questions”, and automatically recognise questions that are similar to past conversations in order to supply the best response. This has given the company additional tools that it can use to evaluate Fin’s accuracy and reliability.
“Over the years of building customer support bots, we’ve developed expertise for evaluating them,” says Reid. “We’ve also recently built benchmark datasets that we can use to specially test large language models, particularly targeted at the type of hallucinations we want to avoid in customer service use cases.”
The team is also using resolution rates to measure Fin’s performance, and Reid says that so far, these are “exceeding our expectations”.
“We’re actually getting higher resolution rates with Fin than we’ve previously had with our Resolution Bot product, even though it required a lot of manual curation,” he explains. However, Reid stresses that it’s “still early days” for Fin, adding, “We’re … continuing to pay attention to Fin’s ability to communicate in a human-like way and adhere to the guardrails we put in place to avoid misleading information and hallucinations.”
Adhering to brand voice and learning over time
Yext Chat has similarly been in closed beta since February, and is undergoing testing with a number of Yext clients who signed up for early access in exchange for their feedback on the developing product. “We’re running a beta – like we do with any new products – with a decent set of existing customers who give us their feedback, identifying opportunities and areas of improvement,” says Beukes.
“The beta process is where we ensure that we get [Yext Chat] market-ready. Our customers are aware that they’re part of a beta – you sign up, and you regularly give your feedback as part of that group … and in return for that, you get a first-mover advantage.”
In addition to the safeguards that Open AI, the organisation behind ChatGPT and GPT-4, has already built into its language models to prevent abuse, Beukes points to Yext’s Knowledge Graph as an important means of safeguarding the technology. The Knowledge Graph is Yext’s proprietary CMS, which stores information about each client brand and its unique attributes, and Yext Chat uses this database as a source for information and tone.
“The reference point will be that particular customer’s Knowledge Graph … and [Yext Chat] will talk in a company voice, which reduces hallucinations,” says Beukes. He also highlights the fact that the technology is capable of learning, refining and improving its responses over time.
“If you ask it a question now, at the hundredth [response to that] question, it is actually going to give you a better answer because it learns [based] on user responses.”
“In order to be more effective, we know that we need AI”
Yext has not put a specific timeline on its launch of Yext Chat, stressing the need to carry out testing until it is market-ready. “You have to ensure that everything is … optimised for best performance,” says Beukes. “It is often hard to put a timeframe to that; nothing should be launched before it’s fully tested.”
Beukes uses the word “coexistence” to describe his vision of humans working with AI tools going forward – but stresses that they should be approached with care. “In order to be more effective, we know that we need AI. We’re used to AI, now; our world is full of it,” he says. “But AI can’t operate without us making sure of its authenticity: that it’s not hallucinating, that it’s not harmful.
“This is the way we view using AI … we provide technology to our customers, and it’s going to make a huge difference to their organisations – but there’s a recognition as this content is only as good as its [human] approval for public consumption.”
Reid describes a similar vision of human agents and AI complementing one another to achieve the best outcomes for customer service. “Ultimately, we believe in a future that is automation-first, but not automation-only – we see support teams where humans and bots will work alongside each other, playing into each other’s strengths to deliver the best possible support,” he says.
“For companies of all sizes, retaining current customers in this environment is crucial” – and Reid believes that “fast, helpful customer support, enabled by the power of generative AI, is certainly a key to that.”