A website I run is undergoing a makeover and is down for the day, and I wanted to show somebody the old version. As such I aimed for the Google cache, which is useful in this sort of situation.
I noticed that the cache had updated in the early hours of the morning, and as such I couldn’t see our old site. Bugger.
It seems that Google is caching news sites with increasing frequency. Yet some newspaper websites don't like Google caching at all...
I thought I’d check this out. First, I aimed for the BBC’s cache, to see how recently that had been updated. Turns out that Google had cached it within the hour. I then looked at the Guardian… same result. It too had been cached within the hour. Pretty quick!
But oddly, a few other mainstream publishers don’t have cached versions of their sites in Google. The Telegraph doesn’t allow it. The Sun says ‘No to caching!’. And The Times isn’t cache-friendly either.
None of those publishers are protecting a subscription access model (unlike, say, the New York Times, which prevents caching for more obvious reasons), so it is a bit of a head scratcher.
Why stop caching?
You can add a ‘noarchive’ tag to the <HEAD> section of your page to prevent Google from caching it. But why a publisher would do this to its homepage is a mystery to me.
One possible reason was suggested to me via Twitter, after I asked the SEO ninjas to answer the question. iCrossing said:
“News sites often do it to stop cached versions of a page being available when they remove or change the story.”
I guess that makes sense but then again, it doesn’t. Why would you prevent users from seeing older versions of your pages? What is there to be gained from such an approach?
People only really aim for the cache when the live website in some way fails, say after a major spike in traffic (such as The Digg Effect). It’s almost always a second-best option, an alternative. So why stop people from seeing it, should they need to?
Time for a rethink…
If, as iCrossing suggests, newspaper publishers are worried that Google is caching historic versions of their fast-moving homepages, then they might as well think again.
At a guess, we reckon that Google updates its cache whenever it indexes a website. Given that news sites attract Newsbot (Google’s news crawler, which visits regularly), wouldn’t it be the case that their caches would update multiple times per day?
The answer is that yes, this does indeed happen, whether it’s through Newsbot or by some other method.
If we look again at the newspapers that allow caching – the BBC and the Guardian – we can see that Google is generating multiple caches per day.
Word to the wise: both sites have had their caches updated at least three times in the past 90 minutes, as follows:
So with that in mind, we can pour cold water on the validity of the theory that some newspaper publishers prevent Google from caching based on fears of it displaying old news.
Am I missing something? Maybe there a business reason behind this no caching policy? Are there advertising issues?
If anybody knows then please leave a comment below, otherwise let’s assume that weird ideological issues are the force behind this policy.
[Image by to Dru Broomfield on Flickr. Various rights reserved.]