Agree with Daniel whoelheartedly about the session ID issue, but any form of IP delivery or User agent delivery is still classed as cloaking and frowned upon by search engines.
There are better ways of resolving this, including mod_rewrite, a host of scripting variations.
No company that uses dynamic content should experience difficulty with getting their content indexed.
Unfortunately, many of the content management system makers worked in splendid isolation when making their products and never looked at how the pages produced would be found by search engines.
This has resulted in many companies paying more than they need to to get decent traffic volumes to their site. Make sure you add this extra cost to the purchase price of the CMS.
On 15:38:02 8 January 2003 Daniel Phillips wrote:
>>>> Do search engines ignore URLs with
>querystrings
>>in them?
>
>>They don't ignore them, but they tend to treat them
>with
>>caution. The spiders are better at processing URLs
>with ?s
>>now, but can still get trapped (Scooter seems to be
>>especially prone to it).
>
>In our experience, search engines are getting better at
>dealing with URLs that contain query strings such as ?s.
>
>In the current search engine climate, Google is by far the
>most important in terms of market share - and Google seems
>to be indexing far more pages that contain these query
>strings.
>
>We have dynamic sites that have been live for only a few
>months and Google has spidered and indexed every page of
>these sites.
>
>What Google does not like are URLs that include session
>ids. These will stop the googlebot dead in its tracks,
>preventing further pages being spidered and indexed by
>Google.
>
>With some of our sites it was the case that human visitors
>using IE or Netscape would not see the session id in their
>URLs (as long as they had cookies enabled), but as
>Google's spider does not accept cookies, it would be
>presented with a session id in the URL. This would
>prevent the spider going any deeper into the sites,
>restricting the number of pages that Google would index.
>
>Using either IP Delivery or User-Agent Delivery it is
>possible to determine that the visitor to the site is a
>search engine spider, and to ensure that a session is not
>started. This will allow the spider to view all the pages
>of the site - seeing exactly what a human visitor to the
>site would see (except for any pages that require the user
>to log-in, etc.).
>
>This is not an attempt to 'con' the search engines, it is
>a method to ensure that they can see the same pages that
>human visitors can access. This is significantly
>different from IP Cloaking, which I would consider as
>seach engine spam. There is no attempt to try to present
>a different page to the spider than what a human visitor
>would see.
>
>Surely this is an ethical technique that doesn't attempt
>to 'cheat' the search engines. If Google were to question
>a site that their spider has crawled and viewed the site
>via a browser, they would see exactly the same as the
>spider. This has to be something that is best for all
>parties concerned - the website owners, the person
>searching on Google, and for Google itself.
>
>Regards,
>
>Daniel
Daniel Phillips
Online Marketing Consultant at Box UK - www.boxuk.com
09 January 2003 10:54am
>On 09:16:33 9 January 2003 webdiversity wrote:
>Agree with Daniel whoelheartedly about the session ID
>issue, but any form of IP delivery or User agent
>delivery is still classed as cloaking and frowned upon by
>search engines.
There seem to be several schools of thought on what is considered 'cloaking'.
I would definitely condemn the practice of delivering different content to search engines in order to achieve a high search engine ranking. This method is used frequently, but can result in your site being banned from a search engine's listings.
However, what I described wasn't a method of trying to con search engines.
I would also argue with the notion that search engines frown upon the use of IP or UA delivery. Yes, if it's abused it can be used unethically, and the search engines will not be impressed. However, the technology itself is not at fault.
In fact Google practices IP delivery. Adwords can be restricted so that they only show in specific countries. Google will detect the country the user is from and determine whether or not to show the ad. This is done using IP delivery.
Used incorrectly I would agree that it is frowned upon by search engines. Used correctly and ethically, I think it is a valid option as a technique to help search engines access database-driven sites. Of course you are right that there are alternatives, such as mod_rewrite, and I would be interested to hear about people's experiences with the different techniques and their relative merits and drawbacks.
Regards,
Daniel
----------------------------------------
Daniel Phillips
Box UK - Internet Development & Consultancy
http://www.boxuk.com
Founder / Director / Co-founder at easyBacklog / Aqueduct / Econsultancy
17 January 2003 03:25am
I have just found an extremely useful website which helps to explain many of the mysteries of search engine optimisation, explains many methods of search engine "cheating" and explains why not to use them. It also goes into some depth about how Googlebot works.
BTW. There has been some discussion about using mod_rewrite to create crawler-friendly URLs. For IIS users, there are tools similar to Apache's mod_rewrite buy two companies http://www.qwerksoft.com and http://www.isapirewrite.com. I am currently trying to integrate the component by isapirewrite.com, and it seems to be working perfectly. It is allowing me to create complex maps from URLs to an asp page adding the required querystring data i.e.
/whitepapers/55.asp can be mapped to /whitepapers/view.asp?id=55. This is all done using Regular expressions, so brush up on your reg expressions...
Evolution7, nice site. Good info on there too. Their approach to SEO jibes nicely with my own, which improves my opinion of them also :-)
As regards the URL rewriting, yes tools like those are great. They can put what would be fairly technical fixes in reach of most website owners, and the benefits in terms of traffic can be stunning.
To misquote, "If they can find you, they will come..."
I agree with your comment about commercial CMSs:
>Unfortunately, many of the content management system
>makers worked in splendid isolation when making their
>products and never looked at how the pages produced would
>be found by search engines.
The site I'm currently working on has a CMS which, as one of its features, adds a &SourcePageID=x to the URI. In other words, the URI reports the originating page. One side effect of this is inactivating "visited links", as you can have many unique URIs for the same page, depending on which page the link appears on.
This discussion made me realise that another side effect would be that the site could appear many times its actual size to crawlers, being indexed many times over and interfering with PageRank algorithms.
Does this sound like a valid concern? What potential benefit does the &SourcePageID information have? User journey analysis?
This method of passing the previous page ID through the URL seems a bit superfluous; the HTTP specification contains a 'referrer' header element, which most (all?) browsers send, that contains the URL of the previous page (and therefore the ID of the previous page, if passed through the query string).
This information can also be recorded in web log files, allowing 'user path' stats to be created (I know that WebTrends uses this, possibly other packages too).
On 15:05:18 28 January 2003 fjordaan wrote:
>The site I'm currently working on has a CMS which, as one
>of its features, adds a &SourcePageID=x to the URI. In
>other words, the URI reports the originating page. One
>side effect of this is inactivating "visited
>links", as you can have many unique URIs for the same
>page, depending on which page the link appears on.
>
>This discussion made me realise that another side effect
>would be that the site could appear many times its actual
>size to crawlers, being indexed many times over and
>interfering with PageRank algorithms.
>
>Does this sound like a valid concern? What potential
>benefit does the &SourcePageID information have? User
>journey analysis?
Tala Sabi-aish
Online Marketing Channel Manager at Adam Phones Ltd
30 June 2006 12:50pm
how valid is this post now? have things changed much since 2002?
Built on the foundations of our previous, highly-renowned report, Econsultancy's SEO Best Practice Guide contains everything you need to know about search engine optimization. At more than 300 pages long, this document will help you understand search marketing like never before. Make no mistake: this guide contains lots of actionable, real world insight. It will help you immediately start to improve your performance across the search engines.
The State of Search Marketing Report 2012, published by Econsultancy in association with SEMPO, looks in-depth at how companies are using paid search, search engine optimization (natural search) and social media marketing. The report looks closely at current practices and emerging trends across paid search and SEO, as well as their relationship with social media.
CEO at Web Diversity Limited
09 January 2003 09:16am
Agree with Daniel whoelheartedly about the session ID issue, but any form of IP delivery or User agent delivery is still classed as cloaking and frowned upon by search engines.
There are better ways of resolving this, including mod_rewrite, a host of scripting variations.
No company that uses dynamic content should experience difficulty with getting their content indexed.
Unfortunately, many of the content management system makers worked in splendid isolation when making their products and never looked at how the pages produced would be found by search engines.
This has resulted in many companies paying more than they need to to get decent traffic volumes to their site. Make sure you add this extra cost to the purchase price of the CMS.
Jim Banks
Web Diversity Limited
http://www.webdiversity.co.uk
On 15:38:02 8 January 2003 Daniel Phillips wrote:
>>>> Do search engines ignore URLs with
>querystrings
>>in them?
>
>>They don't ignore them, but they tend to treat them
>with
>>caution. The spiders are better at processing URLs
>with ?s
>>now, but can still get trapped (Scooter seems to be
>>especially prone to it).
>
>In our experience, search engines are getting better at
>dealing with URLs that contain query strings such as ?s.
>
>In the current search engine climate, Google is by far the
>most important in terms of market share - and Google seems
>to be indexing far more pages that contain these query
>strings.
>
>We have dynamic sites that have been live for only a few
>months and Google has spidered and indexed every page of
>these sites.
>
>What Google does not like are URLs that include session
>ids. These will stop the googlebot dead in its tracks,
>preventing further pages being spidered and indexed by
>Google.
>
>With some of our sites it was the case that human visitors
>using IE or Netscape would not see the session id in their
>URLs (as long as they had cookies enabled), but as
>Google's spider does not accept cookies, it would be
>presented with a session id in the URL. This would
>prevent the spider going any deeper into the sites,
>restricting the number of pages that Google would index.
>
>Using either IP Delivery or User-Agent Delivery it is
>possible to determine that the visitor to the site is a
>search engine spider, and to ensure that a session is not
>started. This will allow the spider to view all the pages
>of the site - seeing exactly what a human visitor to the
>site would see (except for any pages that require the user
>to log-in, etc.).
>
>This is not an attempt to 'con' the search engines, it is
>a method to ensure that they can see the same pages that
>human visitors can access. This is significantly
>different from IP Cloaking, which I would consider as
>seach engine spam. There is no attempt to try to present
>a different page to the spider than what a human visitor
>would see.
>
>Surely this is an ethical technique that doesn't attempt
>to 'cheat' the search engines. If Google were to question
>a site that their spider has crawled and viewed the site
>via a browser, they would see exactly the same as the
>spider. This has to be something that is best for all
>parties concerned - the website owners, the person
>searching on Google, and for Google itself.
>
>Regards,
>
>Daniel
Online Marketing Consultant at Box UK - www.boxuk.com
09 January 2003 10:54am
>On 09:16:33 9 January 2003 webdiversity wrote:
>Agree with Daniel whoelheartedly about the session ID
>issue, but any form of IP delivery or User agent
>delivery is still classed as cloaking and frowned upon by
>search engines.
There seem to be several schools of thought on what is considered 'cloaking'.
I would definitely condemn the practice of delivering different content to search engines in order to achieve a high search engine ranking. This method is used frequently, but can result in your site being banned from a search engine's listings.
However, what I described wasn't a method of trying to con search engines.
I would also argue with the notion that search engines frown upon the use of IP or UA delivery. Yes, if it's abused it can be used unethically, and the search engines will not be impressed. However, the technology itself is not at fault.
In fact Google practices IP delivery. Adwords can be restricted so that they only show in specific countries. Google will detect the country the user is from and determine whether or not to show the ad. This is done using IP delivery.
Used incorrectly I would agree that it is frowned upon by search engines. Used correctly and ethically, I think it is a valid option as a technique to help search engines access database-driven sites. Of course you are right that there are alternatives, such as mod_rewrite, and I would be interested to hear about people's experiences with the different techniques and their relative merits and drawbacks.
Regards,
Daniel
----------------------------------------
Daniel Phillips
Box UK - Internet Development & Consultancy
http://www.boxuk.com
Founder / Director / Co-founder at easyBacklog / Aqueduct / Econsultancy
17 January 2003 03:25am
I have just found an extremely useful website which helps to explain many of the mysteries of search engine optimisation, explains many methods of search engine "cheating" and explains why not to use them. It also goes into some depth about how Googlebot works.
Have a look at http://www.evolution7.com
BTW. There has been some discussion about using mod_rewrite to create crawler-friendly URLs. For IIS users, there are tools similar to Apache's mod_rewrite buy two companies http://www.qwerksoft.com and http://www.isapirewrite.com. I am currently trying to integrate the component by isapirewrite.com, and it seems to be working perfectly. It is allowing me to create complex maps from URLs to an asp page adding the required querystring data i.e.
/whitepapers/55.asp can be mapped to /whitepapers/view.asp?id=55. This is all done using Regular expressions, so brush up on your reg expressions...
Senior SEO at Weboptimiser
20 January 2003 13:59pm
Evolution7, nice site. Good info on there too. Their approach to SEO jibes nicely with my own, which improves my opinion of them also :-)
As regards the URL rewriting, yes tools like those are great. They can put what would be fairly technical fixes in reach of most website owners, and the benefits in terms of traffic can be stunning.
To misquote, "If they can find you, they will come..."
Director of User Experience at Isotoma
28 January 2003 15:05pm
I agree with your comment about commercial CMSs:
>Unfortunately, many of the content management system
>makers worked in splendid isolation when making their
>products and never looked at how the pages produced would
>be found by search engines.
The site I'm currently working on has a CMS which, as one of its features, adds a &SourcePageID=x to the URI. In other words, the URI reports the originating page. One side effect of this is inactivating "visited links", as you can have many unique URIs for the same page, depending on which page the link appears on.
This discussion made me realise that another side effect would be that the site could appear many times its actual size to crawlers, being indexed many times over and interfering with PageRank algorithms.
Does this sound like a valid concern? What potential benefit does the &SourcePageID information have? User journey analysis?
Technical Director at Box UK
30 January 2003 10:17am
This method of passing the previous page ID through the URL seems a bit superfluous; the HTTP specification contains a 'referrer' header element, which most (all?) browsers send, that contains the URL of the previous page (and therefore the ID of the previous page, if passed through the query string).
This information can also be recorded in web log files, allowing 'user path' stats to be created (I know that WebTrends uses this, possibly other packages too).
On 15:05:18 28 January 2003 fjordaan wrote:
>The site I'm currently working on has a CMS which, as one
>of its features, adds a &SourcePageID=x to the URI. In
>other words, the URI reports the originating page. One
>side effect of this is inactivating "visited
>links", as you can have many unique URIs for the same
>page, depending on which page the link appears on.
>
>This discussion made me realise that another side effect
>would be that the site could appear many times its actual
>size to crawlers, being indexed many times over and
>interfering with PageRank algorithms.
>
>Does this sound like a valid concern? What potential
>benefit does the &SourcePageID information have? User
>journey analysis?
Online Marketing Channel Manager at Adam Phones Ltd
30 June 2006 12:50pm
how valid is this post now? have things changed much since 2002?