Google's Webmaster Tools blog has just published a useful presentation, which provides advice on getting your pages crawled and indexed by the search engine.

Basically, the Googlebot can only crawl and index a small proportion of all the content online, so streamlining your site to reduce unnecessary crawling can optimise the speed and accuracy of your indexing.

Here are some of Google's tips, more detail in the full slideshow...

  • Remove user-specific details from URLs.
    For faster crawling and indexing, removing details that are specific to user, such as session IDs, will reduce the number of URLs pointing to the same content and speed up indexing.
  • Look out for infinite spaces
    By 'infinite spaces' Google is referring to large numbers of links with little new content to index. This could be a calendar with links to future dates or an e-commerce website's filtering options which can produce thousands of unique URLs.

    All these extra links mean that the Googlebot is wasting its time trying to crawl all these URLs. Google has some suggested fixes, such as adding the nofollow attribute to such links.

  • Disallow actions that Googlebot can't perform
    Google cannot login to pages or crawl contact forms, so using the robots.txt file to disallow this will save wasted time attempting to crawl such pages.

  • Watch out for duplicate content
    According to Google. the closer you can get to one unique URL for each unique piece of content, the more streamlined your website will be for indexing and crawling.

    This is not always possible, so indicating the preferred URL by using the rel=canonical element, as described in this video from Matt Cutts, will solve this problem.

Graham Charlton

Published 11 August, 2009 by Graham Charlton

Graham Charlton is the former Editor-in-Chief at Econsultancy. Follow him on Twitter or connect via Linkedin or Google+

2565 more posts from this author

You might be interested in

Comments (1)


Clayton Leis

Wouldn't disallowing all of these pages cause some of your internal link juice to disappear into these "black holes"? For example, say every page on your website linked to the 'contact us' page. If you disallowed that page, then every page on your site is passing link juice to the contact us page, but that link juice isn't being passed back through the links on that contact page.

Sure, it may not be a gigantic difference, but if your not having indexing issues,  why bother?

almost 9 years ago

Save or Cancel

Enjoying this article?

Get more just like this, delivered to your inbox.

Keep up to date with the latest analysis, inspiration and learning from the Econsultancy blog with our free Digital Pulse newsletter. You will receive a hand-picked digest of the latest and greatest articles, as well as snippets of new market data, best practice guides and trends research.