In late November, OpenAI, the research laboratory behind the Generative Pre-trained Transformer (GPT) language model quietly released the latest version of GPT: GPT-3.5. Considerable speculation had surrounded when OpenAI might be planning to launch GPT-4 – thought to be the next model in the series – after GPT-3 was introduced in May 2020.
While GPT-4 is still in development, GPT-3.5 is already making waves, primarily thanks to something called ChatGPT: a sophisticated chatbot underpinned with GPT-3.5. Since the early demo of ChatGPT was released on 30th November, users have enjoyed prompting ChatGPT to produce everything from sonnets about string cheese to fake Twitter code (as well as real code) to faux corporate memos, and challenging its knowledge on a range of subjects.
ChatGPT’s fluency and facility in answering questions of all kinds has prompted many to conclude that it could pose a serious threat to Google (and by extension, search engines more generally, with Google being the dominant search engine worldwide), with one Twitter user declaring, “Google is done.”
While other publications and commentators have taken a milder stance, several have proposed that it could still eat into Google’s share of search: Alex Kantrowitz, founder of Big Technology, told the What Next: TBD podcast, “It’s not going to replace search. But even if it takes 5% of Google’s market share, that’s a huge number.” Seeking Alpha, commenting on the possible impact on Google’s stock market prospects, speculated that, “This technology may replace various user scenarios that previously started with the Google search box.”
The idea of search taking a more conversational format has been discussed for years, with many believing that the advent of smart speakers (and their assistants) like the Amazon Echo and Google Dot would herald a revolution in conversational voice search. This has yet to materialise, but could ChatGPT be the innovation that changes all this? Let’s look at how ChatGPT measures up as a substitute for search, what the strengths and weaknesses of ‘conversational’ search are, and what this could mean for how we search in the future.
Does ChatGPT make an effective search engine?
So far, the version of ChatGPT that has been made available to the public is still a demo, and so there is still potential for it to be improved and for its weaknesses to be addressed in future versions. As it stands, however, how well does ChatGPT serve as a search engine?
ChatGPT has demonstrated the ability to respond effectively to a number of fact-based queries, such as, “Who owns Google?” and “Who [has] escaped from Alcatraz?” One user quizzed ChatGPT on “What is the child tax credit?” (without specifying in which country, although ChatGPT appears to have assumed the United States), and noted that ChatGPT’s UX was preferable to Google’s as it gave a direct answer with “no clicking or scrolling links” and provided follow-up answers and definitions; however, ChatGPT’s provided answer was outdated as its training data does not cover events beyond 2021.
This is one obvious limitation of ChatGPT as a ‘search engine’ unless there is a way for its training model to either be updated in real time or at frequent enough intervals that it can always be counted on to offer reasonably up-to-date information. Similarly, the chatbot does not have internet access, and so will note in response to some queries that it cannot access information that is not part of its training data, adding, “It is important for users of my services to keep this in mind and to verify any information that I provide against reliable external sources before using it.”
This is obviously a major limitation right off the bat, as you would essentially need to use a search engine to verify information from ChatGPT, thus not making it a particularly good substitute. However, this could theoretically be tackled if the bot were given access to the internet (or, as mentioned, somehow updated in near-real-time).
The mention of sources alludes to another major weakness of ChatGPT: it never provides a source for its answers (presumably because these are synthesised from a blend of various different pieces of information), which makes them challenging to verify. Writing for Fortune, Steve Mollman noted that, “[ChatGPT is] sometimes flat-out wrong while sounding completely confident about its answer. But as long as you’re aware of this, ChatGPT can be a useful tool—much as Wikipedia can be useful as long as you take its crowdsourced entries with a grain of salt.” However, the crucial difference between ChatGPT and Wikipedia is that Wikipedia does source things (or else flags up a lack of sources with “”), thus allowing readers to identify where the information came from and check its origins for themselves.
When ChatGPT is right, it can be extraordinarily helpful, able to parse questions that are phrased in a way that a human would phrase them and respond in kind, providing conversational yet comprehensive answers and formatting the information in an accessible way, using bullet points or step-by-step instructions. It can even adjust the register of its explanations when requested, phrasing something in terms a six-year-old would understand one moment and in terms suited to an expert the next.
Google has been striving to achieve this “single, comprehensive answer provided directly on the search page” outcome for years: that’s the entire goal of Featured Snippets (or for some searches, the Knowledge Panel), and Featured Snippets even prioritise content that is laid out in an accessible format like a bullet-point or numbered list. The “People Also Ask” feature also offers a semblance of follow-up questions about a topic. However, Google is limited to drawing text directly from the pages it indexes, unlike ChatGPT, which can absorb the information and then present it back to the user in the most intuitive way.
In this sense, it’s easy to see why ChatGPT is being heralded as a potential Google-killer. Yet ChatGPT’s shortcomings in the realm of providing information are currently significant enough to nullify its potential usefulness in that regard. It can be wrong about some fundamental things, like solutions to equations and the fastest marine mammal: errors that can only be identified if the asker already knows the correct answer. (It’s easy to note that a peregrine falcon is not a marine mammal, but if ChatGPT had responded with a type of marine mammal other than the common dolphin – which is the correct answer to the question – the asker might not have known to challenge this). Verifying these requires access to a correct source for the information, which defeats the object of using ChatGPT.
But assuming that these shortcomings could be addressed, so that either ChatGPT’s accuracy could be guaranteed or it came with a fact-checking mechanism built in, would ChatGPT then be able to supplant Google? What are the advantages of search in a conversational format – and are there any disadvantages?
The strengths and weaknesses of conversational search
Since the early days of search, developers have created search engines designed to be able to parse conversational search queries – otherwise known as ‘natural language’ search queries. Many people will remember Ask Jeeves, a 1996 search engine that encouraged users to phrase their searches in the form of a question (“Where can I find a currency converter?”) instead of keywords (“currency converter”).
Another early natural language search project, online since 1993, is the START Natural Language Question Answering System, developed by MIT’s Computer Science and Artificial Intelligence Laboratory. While its interface resembles a search engine, START is actually more like a proto-ChatGPT: its description states, “Unlike information retrieval systems (e.g., search engines), START aims to supply users with “just the right information,” instead of merely providing a list of hits.” An About page details why this is helpful: “In this way, START provides untrained users with speedy access to knowledge that in many cases would take an expert some time to find.”
While online search has become much more sophisticated in the almost two decades since START was first developed, making it less likely that an “expert” would be needed to find the relevant information, the popularity of ChatGPT shows that being supplied with “just the right information” still has widespread appeal.
However, being able to answer any conceivable question in any conceivable wording is a highly sophisticated computational task, since it requires the ability to understand how different words and parts of speech relate to each other and come together to form a whole, and then the ability to retrieve the correct piece of information in response to that. Ask Jeeves and START were ahead of their time, but both had their limitations; major search engines like Bing and Google didn’t start trying to tackle more complex, multi-part natural language queries until the early-to-mid-2010s (2011 for Bing, and 2015 for Google).
An image from Google illustrates the complexities involved in a multi-part search query, and the individual pieces of knowledge that are required to arrive at the correct answer. (Image: Google Inside Search)
However, the goal of true conversational search is desirable enough for major search companies to sink a considerable amount of time and resources into perfecting it. Here are some of the things that make conversational search so appealing:
Accessibility – speaking to computers in ‘human’ language
As START highlighted in its project description, conversational search is more accessible to the “untrained” user: rather than needing to think about what keywords are most likely to return a relevant search result, searchers can phrase their question in the way they would ask a human and have it be understood by the search engine.
Even though the general public has much more day-to-day familiarity with computers in 2022 than they would have in 1993 when START was created, online searching is still a skill that takes time to learn, and often a search can take several iterations to refine as the searcher tries different phrases that may return what they are looking for. In an ideal world, a ‘true’ conversational search interface would be able to interpret the question correctly, regardless of how it is phrased, and return the right answer. While this is not an easy task, so far, ChatGPT has come the closest that we’ve seen to achieving this.
Multi-part queries and follow-up questions
“Who was the US president when the Angels won the World Series?” is a single question, but it contains a lot of different component parts, and most search engines would struggle to identify that the first half (the identity of the US president) is dependent on the second (when did the Angels win the World Series?), potentially returning the wrong information because all variables weren’t taken into account.
To be sure of getting the right answer, most searchers would need to split this up into two queries – “When did the Angels win the World Series?” (or in true keyword format, “Angels World Series wins”) and then “Who was the US President in 2002?” However, a search engine that can parse natural language searches can understand how those pieces of information relate to each other and only need one question to produce the correct answer.
In a conversation, it is also possible to ask follow-up questions without restating the context, because your conversation partner already understands what the topic is. (“What year did the Angels win the World Series? And who was the US President then?”) This is also possible with conversational search, allowing searchers to seamlessly learn more about a topic through follow-up queries, or ask related questions without needing to restate the context.
Most search engines treat each search as a separate, unconnected query, although Google has been improving its ability to retain context across multiple successive queries, such as “Who is the King of England?” “How old is he?” when the searcher is using voice search. ChatGPT has also shown itself capable of retaining context over numerous follow-up questions – this makes sense, as it specialises in conversational interactions, but it also opens up new possibilities for fact-finding.
Google’s voice search is capable of interpreting follow-up queries without needing the context re-stated, making for a more natural “conversation” style of search.
A definitive answer
Many users of ChatGPT have cited the experience of receiving a single, definitive response to their query as preferable to the experience of hunting down the information from multiple possible results, especially when some of those results are ads.
Giving a definitive answer to a question that might have lots of variables isn’t easy, of course, and major search engines still can’t do this for the majority of queries. ChatGPT is unusual in its ability to synthesise information to produce a single response, often laying out multiple sides of a complex issue.
There are drawbacks to the “single answer” search result, however, since it prevents searchers from drawing their own conclusions from the available information, presenting ChatGPT (or the search engine)’s interpretation of what is “true”. AI and algorithms are extremely susceptible to bias, even if they are perceived as objective and rational, and so there is a danger of ChatGPT or a similar program presenting a flawed narrative in response to a complex, or sensitive, question, without any room for the searcher to draw their own conclusions.
No onward journey
The biggest drawback of voice search (the most common mode of conversational search) from a user experience perspective has always been the lack of an onward journey. Users can listen to the answer to their question, but there’s no way to navigate to the originating website to learn more. While some attempts have been made to solve for this problem, such as a trial by Google in which the Google Assistant would read part of a news article aloud and send links to the user’s mobile phone, they have yet to be implemented on a widespread scale.
The result of this is that searching becomes an isolated event: users can ask a question and receive an answer, but unless they have other questions, there’s no reason to use the search engine, or voice assistant in the case of voice search, any further. Additionally, because conversational search provides a single answer and not a list of results, there are many search use cases that it cannot fulfil. Web search was originally designed as a means of making it easier to find websites to visit – in fact, the earliest ‘search engines’ were more like website directories – and many searches are still conducted with this goal. For someone searching for “Christmas gifts under £5”, a list of links to websites is a desirable outcome, rather than a hindrance.
ChatGPT excels at the ‘information finding’ genre of search, but for the ‘website finding’ genre of search, it’s difficult to see how it would supplant web search. On the other hand, studies have indicated that “informational” searches – searches where the goal is information – make up the majority of web searches, with the percentage estimated at more than 80% by one study in 2007. (While you’ll notice this is not a particularly recent stat, it appears to be the most up-to-date figure). This wouldn’t leave search engines with a great deal to divide up between them – particularly when you account for those searches that have already been taken over by more specialised ‘vertical’ search engines or product websites like Amazon.
This is much more of a drawback from the perspective of search engines (primarily Google, whose business model revolves around advertising) and search marketers than end users, many of whom would no doubt be delighted to never encounter another ad, but conversational search is extremely difficult to monetise. If a search only yields a single result, then having that result be paid for or sponsored would be hugely damaging to user trust.
Search advertising is only effective when the searcher has a choice of results, which gives them the option to click or not click on a sponsored result. Onlookers have correctly identified that following in the footsteps of ChatGPT would be a disaster for Google’s business model: in the most recent earnings report for Google parent company Alphabet, search accounted for 57% of Google’s total revenues ($39.5 billion out of a total $69.1 billion). While other revenue models for search engines do exist, such as the money that privacy-first search engine DuckDuckGo makes from affiliate partnerships, search advertising is the most common source of revenue, and so removing it would present a profitability problem for many search engines.
A model for the best of both worlds?
While carrying out research for this article, I came across a tool that could offer a model for the ‘best of both worlds’ between chat-based conversational search and results-based web search: Andi Search. Andi bills itself as “search for the next generation”, combining generative AI-based chat with more traditional-style web search results, so that each direct answer to a question is combined with links to learn more (and also has a direct source that you can click on and read for yourself).
It’s a fun tool to play around with, and the UX is good: neither the chat-based responses nor the search results interfere with one another, and search results are presented in an attractive ‘card’-style format with an image and a short text blurb. Each one leads the searcher to the website it’s drawn from, and some even have a “read” button that opens the article in a scrollable box instead of a new browser tab.
There’s an Images tab for relevant image results, and some searches will also produce a News tab; its creators also say that they have “lots of work to do on shopping and product review searches”, while location searches are “basic but improving”. Similar to a voice assistant, Andi also responds to commands like “Play The Beatles on Spotify” (although this doesn’t directly open Spotify, but produces a search result that can be used to open Spotify) and can navigate to websites (for example, the command “Go Facebook” will open Facebook in a new tab).
Andi Search invites the user to input the command, ‘play the beatles on spotify’. Image: Andi Search
One thing that isn’t clear is what exactly powers Andi’s search engine: most niche search engines, like DuckDuckGo or Ecosia, don’t develop their own search technology, instead being powered by technology from a bigger search engine like Bing (which powers both DuckDuckGo and Ecosia, although prior to 2017, Ecosia used a mix of Yahoo!, Wikipedia and Bing in its search results). Asking Andi about this tends to produce responses about its ad-free search (Andi has a firm anti-advertising stance, and plans to sustain itself in future via a freemium business model). However, a tweet from one of its founders indicates that Andi combines “semantic search” with LLMs (Large Language Models) and live data from the web and APIs.
Andi is still in alpha, and I will note that its results are not always accurate – a search for “What is the fastest marine mammal?” (a surprisingly effective test of generative AI) returns the answer of the Dall’s Porpoise, which Andi claims is faster than the Common Dolphin with speeds of up to 35km per hour, or 22 miles per hour, citing Whale and Dolphin Conservation USA as its source. However, Whale and Dolphin Conservation USA states that the Dall’s Porpoise can reach speeds of up to 34 miles per hour, which is 55 kilometres per hour.
Andi Search gets a little confused about the fastest marine mammal (to be fair, so am I). Image: Andi Search
The source and link makes it possible to check this, but it lends more weight to the need for conversational search tools (Andi calls itself a “synthesis engine”, i.e. synthesising information from multiple sources to produce an answer) to be reliable – because people won’t necessarily visit the original source to fact-check the answer. Granted, knowing the identity of the fastest marine mammal may not be world-changing, but if a tool like this became more widely-used, a lack of accuracy could become problematic.
Could Google get ahead of the competition?
Google is no stranger to the powers of AI – it has had a dedicated AI division, Google AI, since 2017, which among other projects, produced LaMDA, a family of conversational neural language models. The most up-to-date version of LaMDA is trained on 137 billion parameters, compared to ChatGPT’s 175 billion.
At a recent all-hands meeting, Google executives were reportedly asked whether ChatGPT’s viral success was a “missed opportunity” for Google, given that it has had its own conversational AI in LaMDA “for a while”. In response, Alphabet CEO Sundar Pichai said that Google needs to “balance” the desire to be bold with the need to be responsible; Google AI lead Jeff Dean added that the company is moving “more conservatively than a small startup” due to the “reputational risk” involved. “It’s super important we get this right,” he said.
Many have predicted that Google’s size and status as an industry behemoth will work against it when it comes to defending against potential disruptors, making it difficult to move fast enough or take the necessary risks to innovate. It may also be significant that OpenAI is backed by Microsoft, owner of search competitor Bing. There has yet to be any talk of Bing integrating ChatGPT into search, although in a related move, Bing has begun integrating an image generator powered by Dall-E 2 into its search engine, further blurring the boundaries between search and generative AI.
(Update January 2022: Since publication of this article The Information has reported that Microsoft is preparing to integrate ChatGPT into Bing, though Microsoft has yet to officially confirm this).
Another player on the field is Meta, which launched its own conversational AI prototype, BlenderBot, to the public in August. Unlike ChatGPT, BlenderBot is connected to the internet, and its messages can be clicked to learn more about what generated the response. However, it mostly made headlines for producing insults about CEO Mark Zuckerberg, and Meta researchers have acknowledged that the bot has “a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt”.
Despite the buzz around ChatGPT, no chatbot contender is without its issues, and it remains to be seen whether the problems with the technology can be ironed out, or whether generative AI chatbots will always be a flawed replacement for search engines. This will be an interesting area to watch in 2023, particularly with rumours flying around the possibility of an imminent launch of GPT-4.
Further reading on the fast-developing space of generative AI: