web linksAn important part of successfully managing your search engine optimisation targets is to nudge the search results your way when it's in your control. To help you achieve this target, there are some links you should prevent from getting indexed by the engines to begin with.

Firstly, because they offer little or no user experience benefits, secondly because they might get indexed instead of the desired content and lastly because preventing the engines from crawling unnecessary pages will reduce your bandwidth costs.

Here are few links you really don't want indexed:

1. HTTPS versions of your pages - To check if Google is indexing any https versions of your pages, simply type [inurl:https site:examplesite.com] and fix this by redirecting https to http using your .htacess file with the exclusion of log in pages.

2. Development server - Check to see if Google is indexing your development server by typing [site:development-site.com] and make sure to fix this situation by restricting the site from getting indexed using either 'Disallow: /' in the robots.txt file (no guarantee that it will always work), placing the site behind username and password or simply granting access based on your company IP address range.

3. Affiliate links - If you're using one of the affiliate networks, chances are this won't matter much as the link juice won't follow your way anyway, but to the affiliate network. If on the other hand you're using off the shelf software to manage your affiliates in house, the affiliate URL might look something like www.sitename.com/?affrt1|2|3|4|etc so to find if you've got any affiliate URLs indexed, search for [inurl:<ID-code> site:examplesite.com]

4. Pay per click agency links - Similar to off the shelf affiliate software, some PPC agencies might use proprietary software which can get indexed. You're able to restrict these pages from appearing in the index using your the robots.txt file and a wildcard query.

5. Google URL Builder links - I am a huge fan of Google URL Builder tool and often use it to track conversions from Google Base, newsletters and much more. The only complaint I have is that the URL uses question mark (?) parameter ID which does not prevent the page from getting indexed. The solution would have been to use hash (#) instead of the ? so www.examplesite.com/?url-id will get indexed, where www.examplesite.com/#url-id won't. Searching Google for [allinurl:utm_source utm_medium] which are parts of the tracking code brings up some interesting results. While there some hacks which allow you to tweak Google Analytics tracking code to use #, the quickest solution is to use canonical tag across the site and hope that Google will drop the other version of the page from its index.

6. Web based newsletter copies from way way back - To check if Google is indexing any old copies, search for the directory on Google using [site:examplesite.com/directory-name/]. As these pages might have some value you should 301 redirect them to a more appropriate page on the site. Often using this method is an easy way to pick low hanging fruit as you'll likely to win some more backlinks.

7. Shopping cart and log in pages - You should keep them out of the index not necessary because of content duplication issues, but because they offer no value.

What's your search engine index strategy?

Photo credit: anna maria lopez lopez via stock.xchng

Ran Nir

Published 24 April, 2009 by Ran Nir

Ran Nir is founder of Conversion Counts, a web analytics and conversion optimisation agency, and a contributor to Econsultancy. He can also be found on Twitter and LinkedIn.

8 more posts from this author

You might be interested in

Comments (2)

David Iwanow

David Iwanow, SEO Product Manager at Marktplaats.nl

Dont forget if you are using blogs such as Wordpress, your old posts can rank above newer topics, you no longer want last years event details to be shown first or maybe ever again.

You can set a page's guide for importance within wordpress which writes this information to your xml sitemap that all search engines will read.

about 9 years ago

david carralon

david carralon, Head of SEO EMEA & APAC at Career Builder

I have experienced high rankings with old enewsletters that were floating in the server, some of them boasting high page rank... it is definitely a good tip to 301 redirect them to other, not so popular areas of the website that may be related in some way with the enewsletters. 

Another one thing I have noticed with sites that host client work in that the typical client/ folder is not being excluded in the robots file and therefore indexed by the engines eg: www.seoagency.com/clients/client1/developemnt.htm  etc...

about 9 years ago

Save or Cancel

Enjoying this article?

Get more just like this, delivered to your inbox.

Keep up to date with the latest analysis, inspiration and learning from the Econsultancy blog with our free Digital Pulse newsletter. You will receive a hand-picked digest of the latest and greatest articles, as well as snippets of new market data, best practice guides and trends research.