CEO at Econsultancy
01 August 2002 10:40am
I read your feature on search engine optimisation in this month’s newsletter (http://www.e-consultancy.com/newsfeatures/newsletter/view.asp?id=361) with interest. I also read your piece on “Optimising dynamic web sites and database driven content” on your own site at http://www.greenlightseo.co.uk/optimising_dynamic_content.htm.
There does seem to me to be quite a challenge here and I’m not sure I’m clear yet on the answer. The challenge is: how do you optimise a database-driven web site for search engines? Let’s take two things as given:
1. Search engine optimisation is important and worth doing
2. Databases / Content Management Systems are the only sensible way to efficiently (cost & quality) manage large-ish web sites
If we take these 2 things as given then it is clear why the challenge is so important. I know I’ve thought about it a fair bit in relation to this site which is essentially one large database, some smart templates and plenty of business logic tying the two together to produce the front end. Pretty much nothing exists ‘flat’, it is all dynamic and most of it has to be – the personalisation features such ‘since your last visit’ clearly cannot exist hard-coded. Increasingly sites are seeing this kind of thing as important for improving the customer experience. So where does that leave SEO?
From all that I’ve read it seems that a simple, keyword-heavy flat HTML page with hard-coded links to other similar pages is best for getting top search rankings. This is not necessarily anathema to a ‘content management system’ – many CMSs can, or can only, publish flat HTML pages. Interwoven is a good example. It is a CMS and it is database-driven (at the development end) but does not produce dynamic sites. It outputs ‘flat’ sites. And there is a lot to be said for this, particularly in terms of performance (flat sites perform much better than ‘dynamic’ ones) and quality control (less chances of errors).
But let’s say we are talking about dynamic, personalised sites i.e. pages that don’t exist until the user requests them and they are then dynamically built. How do you optimise these? I guess there are 3 approaches:
1. They change: you wait until the search engines ‘upgrade’ their spiders so that they can spider such dynamic sites (your article says Google is making such steps)
2. You change: you restructure your site and change the way it works to accommodate the search engine’s spiders
3. You look for a work around
I would say that although most site owners recognise the importance of SEO it is not as important, say, as effective content management or perhaps an enhanced customer experience through personalisation. I would be very surprised, for example, if I was to be persuaded to alter the information architecture of e-consultancy (which is driven by business and user needs) to suit the needs of search engine spiders. However, this seems to be what your feature suggests? In your article on your site you say:
“Greenlight search engine optimisation usually tackles these obstacles at layer 1 of its multi layered optimisation programme, with what it refers to as the sub site layer. This deals with the technology on which a website is hosted, and in essence sets a foundation for subsequent optimisation.”
What exactly does this mean? Is it possible to accommodate search engine needs without compromising business and customer needs?
These issues have to date, I believe, left many site owners looking to option 3 above – a work around. ‘Doorway’ pages, which seem to be frowned upon, are a good example. I’m wondering, though, isn’t there a way to use CMS and personalisation technologies to our SEO advantage? For example, we could easily serve customised web pages to search engines’ spiders. We know ‘who’ the spiders are when they arrive so we could just then serve them specially optimised pages to help our rankings? This would not be ‘defrauding’ users or unfairly skewing search results, it would just help improve search rankings.
What do you think?
Senior SEO at Weboptimiser
11 December 2002 12:50pm
>> how do you optimise a database-driven web site for search engines?
The short (best) answer is "You cheat". The real question is "How do you get a search engine to list what you WANT it to see?"
There are 2 possibilities :
1) You pay. For largish (min 500 page) sites, or clients of larger SEO outfits, there is a little know route to inclusion in some of the SE's databases called "XML trusted feed", which, when you get right down to it, is "legal" cloaking. The difference is that you are paying the SE's for the privelege of spamming them, so, magically, it's alright
2) You pay. This time, you track down a GOOD SEO/cloaking outfit, and pay them not insubstantial amounts of money. A word of warning :- you are unlikely to find a really good proponent of this black art on the Web. These guys do everything they can to stay OFF the search engines radar (because if they can see you, they can track you, and if they can track you, they can ban you). They may have a small website, but it won't have a client list (if you were doing this, you wouldn't want anyone to know either). You will find them by personal recommendation, or painstaking research. When you find them, they will decide if they want you as a client, not the other way round
There are other options. A bit of server side coding to strip the troublesome "?" character from your URLs, and replace them with spider-friendly pseudo-static addresses will work wonders
or similar, depending on your sites structure. This presents some problems if you have a churn of elements within a given page, but its usually liveable
>> we could easily serve customised web pages to search engines’ spiders.
Yes, you could. If you are caught doing it (without paying) they could also drop you from their index, and blacklist you. And tell their friends. And look for any other domains you own (did you know its possible to acquire a CD with a dump of DNS data on it? Or get live info by running a DNS node?) and dump/blacklist them too. But then again, they might not <grin>, it seems to depend on whether that particular search engineers girlfriend walked out on him that morning or not
11 December 2002 18:56pm
Thanks Brendon - that's very helpful. Do you know where I can find out more about these mysterious "XML trusted feeds". I can't find anything on Google about them, for example, though I have heard about them elsewhere. How much does it cost? What's the process involved?
CTO at The Search Works Intl Ltd
12 December 2002 01:06am
>Do you know where I can find out more about these mysterious "XML trusted feeds".
Each search engine has their own version of trusted feeds except Google. Try
http://uk.altavista.com/web/trustedfeed and Inktomis Index Connect system through various 3rd parties.
Google doesn't do Trusted Feed at all and picks up most web sites on it's own, even dynamic ones although it depends just how dynamic the site is.
>> how do you optimise a database-driven web site for search engines?
In largly the same way as optimising an html web site but using php, asp, java, etc instead of html, and writing it so spiders can find content.
Cloaking and doorways are ultimately a 2nd best service. Not because it's difficult to do, but because it doesn't produce the results in terms of ranking or traffic it used to. This is because PageRank or varients of it have become a much bigger part of the algorithms over the last couple of years and no-ones going to link to a cloaked site. The best way with dynamic sites is to construct the site in a way which allows and encourages spiders to find the actual content.
Trusted feeds is one way to do this by feeding the content straight to the spider, this tends to be on a pay per click basis. There are various other ways to make dynamic sites visible using server configurations or even just a few simple site alterations. It all depends on what you've got to work with.
CTO at Econsultancy
12 December 2002 10:38am
>1) You pay. For largish (min 500 page) sites, or clients
>of larger SEO outfits, there is a little know route to
>inclusion in some of the SE's databases called "XML
>trusted feed", which, when you get right down to it,
>is "legal" cloaking. The difference is that you
>are paying the SE's for the privelege of spamming them,
>so, magically, it's alright
How does one go about doing this? I have heard about this before, but am also not sure which of the main search engines support this, and how much it would cost.
You also mentioned the querystring ? in URLs effecting the indexing of your pages. Do search engines ignore URLs with querystrings in them? If so, would it be just as effective to offer a route for spiders to index which does not contain ? in the URLs, and a different one for users. Would this be ineffective as external sites would then be linking to the pages with the query strings in them.
If crawlers don't normally index pages with ? in them, do they not access pages within paged results i.e. 10 pages of results 1 - 10 ....
12 December 2002 12:26pm
>>>How does one go about doing this?
XML feeds are provided through a number of search engine partners, they each have the expertese to integrate the requirements of the feed with the site architecture. Probably the best known is http://www.positiontech.com/ but there are a number of other trusted feed partners.
Trusted feeds are offerred by Inktomi, AltaVista, Fast and AskJeeves, which covers most of the major portals. Costs tend to be a price per click generated from a page in the feed with a minimum charge per page per month. Pricing is dependant on the merketplace you're in so loans traffic will be more expensive than holiday home traffic.
>>>querystring ? in URLs effecting the indexing of your pages.
Search engines do list urls with ? query strings in them but the urls need to be linked from other pages to avoid spiders getting caught out by infinate versions of the same page. Just done one for a company where the search facility now produces results pages optimised for the visitors search query and containing the appropriate content drawn from their database. Linking to a selection of the more important results pages gets them indexed and generating highly targeted traffic. It's a case of constructing a solution to fit the individual site architecture of each web site. No two are the same.
CEO at Greenlight
12 December 2002 13:04pm
>>> avoid spamming
It is possible to optimise database driven sites without cloaking or cheating. Cheating using cloaking invariably ends up backfiring eventually as do many other forms of SEO spam, inspite of often impressive early results. Google invests much time in scaling anti cloaking and spamming solutions, and since they are the ones that know the algorithm, you are ultimately fighting a losing battle.
Getting any website indexed in search engines is down to it being able to be crawled, read and found relevant. DB driven, or not , dynamic or otherwise, if there is no way to crawl from one page to another via static links you will end up failing to be indexed, and even if you do get crawled, if your not the 1 of the 10 most relevant pages out of however many 100's, 000's or even 000,000's you still won't rank.
You need to develop crawl paths and ensure that your information architecture doesn't leave those essential pages of content which will leverage you the listings, locked up in a folder somewhere with no way to get to it unless the user performs some kind of action.
Assuming reaching pages is not a problem you will then need to sort out your relevance for target terms. But we'll do that on another thread.
>>>As for Trusted feeds
What they allow you to do is insert custom page information specified in XML format directly into their searchable database, rather than have it indexed by spiders in the traditional and often detrimental way.
The advantages are that it allows you get large amounts of content indexed very rapidly in fairly controlled manner, and custom define what is displayed in the listings. We’ve used them very successfully on several clients now, and the traffic has been fairly impressive. Although given the engines that trusted feed’s target they may benefit consumer sites better than B2B.
The success of the exercise though lies in the creation of a clean and well optimised XML feed. Essentially its an inclusion tool, so including poor irrelevant content still won't solve your ranking problems. You need to make sure the information going in, is still relevant enough to yield a listing.
Currnetly Altavista, FAST and Inktomi have trusted feed programmes set up. Google won't do it as it undermines their alogorithm, and their very reason for being.
We have a complete Trusted feed, creation and management programme if your interested. More info at http://www.greenlight.co.uk/optimisation/cpc-seo.htm.
Director, & Consultant SEOP
t: 0208 493 3780
m: 07958 764476
Your total search engine marketing partner
Optimisation | Consultancy | Submission | Trusted Feed | PPC | Training | Web Analytics
Visit us at http://www.Greenlight.co.uk
On 18:56:27 11 December 2002 Ashley wrote:
>Thanks Brendon - that's very helpful. Do you know where I
>can find out more about these mysterious "XML trusted
>feeds". I can't find anything on Google about them,
>for example, though I have heard about them elsewhere. How
>much does it cost? What's the process involved?
13 December 2002 10:23am
>> Search engine robots cannot perform any other action other than following html links
Not quite true any more. Googlebot will pull links (but not content) out of Flash files, and FASTs' bot will pull links AND content out of Flash files, following their deal with Macromedia to integrate to the Flash SDK 1.0 technology. However, maintaining the the "HTML links good, JS navigation bad" mindset is still good practice. There is also limited evidence that Google are prepping a JS parser, as I have recently started seeing reports of .js files being requested
>> Currnetly Altavista, FAST and Inktomi have trusted feed programmes set up. Google won't do it as it undermines their alogorithm, and their very reason for being.
Ask Jeeves also have a beta programme going (actually being run through Teoma, which still isn't properly integrated into the .co.uk site yet, I don't think). Note also that the Lycos programme includes Lycos, Fast and Inktomi
13 December 2002 10:42am
>> Do search engines ignore URLs with querystrings in them?
They don't ignore them, but they tend to treat them with caution. The spiders are better at processing URLs with ?s now, but can still get trapped (Scooter seems to be especially prone to it).
You can get dynamic pages indexed, even top ten, but you need a powerful reason for the spider to visit (ie high PageRank or equivalent) and to offer them a method of identifying the vaild options (ie have all the options in a drop-down list, or similar)
Finding a way of eliminating the ?s is usually the best option, so use of mod_rewrite, or server-side rules, whatever. A competent web tech, or SEO will be able to advise you. Some dynamic systems that use ?s routinely often have a fix available, ie ColdFusion
Online Marketing Consultant at Box UK - www.boxuk.com
08 January 2003 15:38pm
>>> Do search engines ignore URLs with querystrings
>They don't ignore them, but they tend to treat them with
>caution. The spiders are better at processing URLs with ?s
>now, but can still get trapped (Scooter seems to be
>especially prone to it).
In our experience, search engines are getting better at dealing with URLs that contain query strings such as ?s.
In the current search engine climate, Google is by far the most important in terms of market share - and Google seems to be indexing far more pages that contain these query strings.
We have dynamic sites that have been live for only a few months and Google has spidered and indexed every page of these sites.
What Google does not like are URLs that include session ids. These will stop the googlebot dead in its tracks, preventing further pages being spidered and indexed by Google.
With some of our sites it was the case that human visitors using IE or Netscape would not see the session id in their URLs (as long as they had cookies enabled), but as Google's spider does not accept cookies, it would be presented with a session id in the URL. This would prevent the spider going any deeper into the sites, restricting the number of pages that Google would index.
Using either IP Delivery or User-Agent Delivery it is possible to determine that the visitor to the site is a search engine spider, and to ensure that a session is not started. This will allow the spider to view all the pages of the site - seeing exactly what a human visitor to the site would see (except for any pages that require the user to log-in, etc.).
This is not an attempt to 'con' the search engines, it is a method to ensure that they can see the same pages that human visitors can access. This is significantly different from IP Cloaking, which I would consider as seach engine spam. There is no attempt to try to present a different page to the spider than what a human visitor would see.
Surely this is an ethical technique that doesn't attempt to 'cheat' the search engines. If Google were to question a site that their spider has crawled and viewed the site via a browser, they would see exactly the same as the spider. This has to be something that is best for all parties concerned - the website owners, the person searching on Google, and for Google itself.
Free market research on digital marketing
Daily Pulse: award winning newsletter
It takes 30 seconds to register