Every page you visit on the Internet will return something called a status code, a code consisting of three numbers that communicate to the requester the status of their request for a particular page.
A 404 means ‘Not Found’: the server you are requesting the page from has acknowledged your request but the page you are requesting could not be found.
A 404 is ultimately an error message by default and is a very frequent and recognisable message experienced by every single internet user. 404s are not inherently bad, they exist for a very good reason.
Their ambiguous nature however means that search engines (and your users, and your rankings) will often benefit from some direction on what action to take when they come across them. Without this direction and left unmanaged, 404 errors are problematic.
Here are the SEO impacts and the possible solutions.
Types of status codes
Every page you visit on the Internet will return something called a ‘status code’, a code consisting of three numbers that communicate to the requester the status of their request for a particular page.
These can be set by the administrator of a server or be a default server communication based on certain criteria being met (or not met). The following are the most frequently returned status codes:
- 200 OK: The page you are requesting has been found and here it is.
- 301 Moved Permanently: The page you have requested has moved permanently from the location you’ve requested it from (Location A) to another location (Location B), and here it is.
- 302 Found: The page you have requested has moved temporarily from the location you’ve requested it from (Location A) to another location (Location B), and here it is.
- 404 – Not Found: The server you are requesting the page from has acknowledged your request but the page you are requesting could not be found.
The last of these, the 404, is an ambiguous status code as the server cannot find what you are looking for but has made no attempt to contextualise why that might be.
Is it because the page was removed by the webmaster, or the URL was mistyped by a user? Is it because a malformed internal or external link was followed to the failed location from another website?
Or is it because the page was deleted or renamed, intentionally or unintentionally?
A 404 is ultimately an error message by default and is a very frequent and recognisable message experienced by every single Internet user.
You can check whether a URL is delivering a 404 response by using the URL Inspection tool in Google Search Console, as well as a number of tools that will crawl your site and identify them all, tools such as Screaming Frog, SEMrush and many others.
The check is important because many people have 404 pages that look like 404 pages, complete with a standard ‘Something went wrong’ message, but the implementation was incorrect and the response actually delivered is a ‘200 OK’, i.e. it looks like a 404, reads like one, but technically isn’t because the status code returned isn’t a 404.
That is called a ‘Soft 404’ and is far more common than you might think.
What are the SEO impacts of 404s?
Firstly, 404s are not inherently bad. They exist for a very good reason and the search engines expect to see them on most sites. Their ambiguous nature however means that search engines (and your users, and your rankings) will often benefit from some direction on what action to take when they come across them.
Without this direction and left unmanaged, 404 errors are problematic for two reasons:
Firstly, 404s often introduce link, page and site integrity and fidelity issues. At the most basic, 404s on your site can break crawl paths and impact on accessibility, and attempts to manage 404s often create even bigger problems, for example when SEOs and webmasters make poor decisions around where to 301 redirect them.
Furthermore, a search engine must make a judgement call on a site in its entirety if it is seeing a huge number of 404s as a percentage of all pages on the site.
Secondly, a search engine will be allocating link equity across the pages of the internet by following links from pages to pages and a 404 header response breaks that chain so a search engine needs to decide how to algorithmically deal with that.
Let’s call that a ‘link sink’, with the implication of a ‘sunk cost’ quite intentional given the marketing and proactivity that may have led to that link being placed that ends in a 404 on your site.
With big sites, and those that may have accumulated a large number of 404 pages over time, the quantity of lost link juice may be substantial and herding it would be a legitimate and good use of your time, as well as using your default approach to 404s to pre-empt the most common problems.
Ultimately, SEOs and webmasters will very likely have existing 404 problems, deficiencies, and inefficiencies to resolve, but also need to put in place a robust infrastructure and process for it to be as self-maintaining and optimising as possible, particularly for huge sites.
What are the possible solutions to 404s?
SEOs and webmasters typically believe that they have three choices with regards to how to manage 404 pages.
1. Do nothing
Search engines are really smart these days and some SEOs and webmasters believe that there’s very little value to be found in trying to manage 404s and that, assuming the site is configured properly, that the search engines will pretty much take care of everything.
2. Use a soft 404 rather than a real one
The rationale here for many is that a real 404 cannot be manipulated from an SEO perspective fully as by its nature you are instructing a search engine to purge the page from its index.
With a soft 404 the page can contain links to your commercial pages and you can ‘funnel’ link equity around the site like the administrator of a complex aqueduct.
3. 301 redirect all 404 pages to the homepage
Some SEOs believe that there should be no 404s returned by the web server…ever.
This school of thought dictates that every 404 be 301 redirected to the homepage automatically as and when they materialise to preserve link equity and also give consumers a starting position if they were to come to the site via that 404.
4. 301 redirect all 404 pages to a related and relevant live page
As above but with some logic to dictate where a 404 page should be redirected based on page relevance, funnelling link equity to a more appropriate page than the homepage, and also funnelling link equity to arguably more appropriate pages than just the homepage.
In reality, it is a combination of those four solutions that will be right and each solution will be different depending on the site in question.
This video from Google Webmasters offers a handy overview of how to deal with 404 errors.
Guidelines to maximise SEO value
All solutions, however, must be consistent with the following guidelines to maximise SEO value:
1. Do not use soft 404s and test your 404s to make sure that they have been implemented correctly
Alternatively, you can use a 410 status code rather than 404. In 2018, Google’s John Mueller elaborated on how Google treats 404s and 410s: “From our point of view, in the mid term/long term, a 404 is the same as a 410 for us. So in both of these cases, we drop those URLs from our index.
“We generally reduce crawling a little bit of those URLs so that we don’t spend too much time crawling things that we know don’t exist.
“The subtle difference here is that a 410 will sometimes fall out a little bit faster than a 404. But usually, we’re talking on the order of a couple days or so.
“So if you’re just removing content naturally, then that’s perfectly fine to use either one. If you’ve already removed this content long ago, then it’s already not indexed so it doesn’t matter for us if you use a 404 or 410.”
So, there’s a fundamental similarity in the treatment, although 410s may drop out of the index more quickly.
You may be tempted to create a custom page that is quirky and innovative so that it can attract links, whose link juice can then be funnelled around the site via links on that custom 404.
This would only work if it were a soft 404 page, not a real one (as the search engines won’t follow links from a real 404 page).
Whilst these can be incredibly cool, they do not come with the other benefits of using real 404s (automatic housekeeping, link juice preservation, intelligent redirection for consumers, etc, etc).
So, if you want a custom, novelty 404, just make sure it is returns a real 404 status code, but be willing to forego any links that it might attract – you should just consider it viral marketing, as opposed to SEO marketing.
2. 404 pages that receive traffic should be redirected
They should be redirected to a page that is the most appropriate to its original topic but that will also not jar with human users if and when they are redirected.
Always remember that in many cases you aren’t just redirecting pages and search engines, but real people, with real money, and real buying intent.
3. 404 pages that have inbound links from other websites should be redirected to pages that are consistent with the anchor text mix of the links
4. Leave 404 pages that have no traffic or link value as they are and the search engines will purge them from the index
If speed is of the essence then a 410 may slightly expedite matters. Remove all links to those pages from your site though to conserve link equity and improve your user experience.
Bespoking an approach based on the guidelines above can be done in a number of ways, including building a custom 404 handler.
This is a method of adding your own custom code to how your server deals with 404s, including conditional arguments before returning the 404 message. For example, you could code your 404 handler to check for the requested URL in a database you might have to determine where to redirect it to.
You could even have the 404 handler check that the URL receives traffic and if it does then to redirect it to a page that has the closest anchor text link profile, and if it doesn’t receive traffic or links to leave it be, etc, etc.
There are practically no limitations to the power of a customised 404 handler other than determining what the effort versus benefit might be of the coding effort.
Typically it is monstrously large and complex websites that would benefit most from that level of automated intelligence. All sites large or small will however benefit from an optimal and consistent approach to the management of 404s.
Pros and cons of 404 solutions
This article was updated in September 2019.