My last few posts have been fairly serious, so I figured I would mix in a little fun with this one and I’m soliciting input too (there’s a £50 prize for the best contribution,see at the end).

I’ve seen a spate of “Easter Egg” links doing the rounds on Twitter recently, some of which have been pretty inspired. They’re generally either the 404 page or the robots.txt page.

The topic for this post is to talk a little bit about what these pages are actually for and to share some of my favourites among the new ones I’ve seen recently.

404 page, no juice available

The 404 page is the page you are shown if the page you are looking for, via your browser, doesn’t exist (here is our 404).

Generally, if a user follows a link to a non-existent page, the site should indicate clearly to the visitor that they have followed a broken link and indicate to the search engines that the page doesn’t exist (via 404). So far, so straightforward.

A temptation arises if one has a “fun” 404 page that attracts links and social media traffic from people who find the page amusing.

Any internet marketer knows that links help sites rank better in search engines (and social media signals may well too) and rankings mean business.

The temptation is therefore to mask the fact that this is a non existent page by returning a 200 “OK” code, so that the search engine is fooled and the ranking benefit is still gained by the site.

Google calls such pages “crypto 404s” or “soft 404s” and, because they don’t want to be sending searchers to broken pages, they seek to identify such pages and remove them from the index nonetheless. 

Whilst I haven’t read anything that says Google punishes such behaviour in terms of site rankings, it’s duplicitous so it’s hardly sending Google the most trustworthy signals about your site and your behaviour. 

404s are definitely something that should be monitored and addressed if possible. In Google’s webmaster central you can pull a list of all links the Googlebot has followed to “pages” on your site where a 404 was returned.

While having many links pointing to non-existent pages on your site is not, as far as I’m aware, penalised by Google, it could still have a negative impact. First by the ranking “juice” of links being lost and secondly by adversely affecting browsing stats, such as bounce rates, that Google is believed to collect via its toolbar and factor into its ranking algorithms.

To fix 404s you can either create a page at that location where relevant, set up a permanent (301) redirect from the non existent page to a relevant page or contact the site where the broken link is sitting and ask them to amend the link to point to an appropriate page. It’s boring but important.

If you have taken care of the above, then there’s then no reason you shouldn’t then try and make the page fun / memorable in order to perhaps garner some social media visits and drive general brand awareness. 

Here are five of 404s that I’ve liked recently, in ascending order of weirdness:

Simple and to the point:

Vzaar’s 404:

 from Distilled:

Trying way too hard but pulls it off:

This one is my favourite for its pure weirdness:

Econsultancy’s CEO wipes out:

Robots.txt, juice available

The robots.txt file is a page of plain text on a website (here is our robots.txt) which is read by search engine bots and other automated web crawlers to understand which pages the site owners does not want them to crawl and index.

It is very common, for example, to “disallow” crawling of anything to do with the basket and any pages where a customer might be signed in to a “My Account” area or equivalent.

Whilst it is unlikely that a bot would get into an account area requiring sign in, there is no harm in disallowing it anyway, just in case; having Google crawl all your customers’ private pages and put them in its index would not make you very popular with your customers.

You can also use robots.txt to ask certain robots not to crawl you site (though they might ignore you and do it anyway) – we have blocked a few that kept crawling and wasting valuable server resources during peak periods (Valentine’s and Mother’s day, in our case).

NB robots.txt is not to be confused with the robots tag. This is a tag that you can use to give specific instructions about a page to a robot.

For example, our automated Twitter app “Flowers & Fun” has generated nearly 450k happy birthday messages (including one to President Medvedev of Russia that got picked up by Russian press) each of which is a page on our site.

Rather than have Google index all of these similar looking pages, we added the robots meta tag “noindex, follow” to all of these pages meaning that any link juice coming into the pages flows through but the pages themselves do not go into search engine indexes (indices?), make us look spammy and thus risk getting us a penalty.

Because the main robots.txt file is relatively simple, there is room for humour, albeit it must be said that this is humour that will most likely only be read by geeks. Additionally, because these pages are crawled, generating links to them should be contributing to the site’s overall ranking.

It is no coincidence that SEO companies often do something quirky with their robots page, presumably to capitalise on this opportunity. 

A few of my recent favourites robots.txt pages:

That wraps this post up. There have been quite a few posts here and across the web in the past about great 404 pages, but I haven’t seen many on robots.txt (bar this one).

So if you have a great robots.txt to share, please drop a link in the comments.

To give some incentive, we’ll offer a £50 bouquet for UK delivery to the best suggestion received by the 16th of October (my decision final etc). Begin!