CMS, SEO and Accessibility
Featured threads
- How relevant do links need to be? 13 replies
- Tracking Online Response to Marketing/Communications Activities 3 replies
- Behavioural targeting software 4 replies
- Penalty avoidance on English-speaking foreign sites 5 replies
- 3 way linking - good or bad? 21 replies
Most viewed threads in last month
Most active threads in last month
- Recommendations for email signature management tools? 3 replies
- Methodology for target audience analysis 1 reply
- Travel Partner 1 reply
- In-store tablets 0 replies
- Job Opportunities 0 replies



Technical Director at Box UK
08 March 2004 20:14pm
It used to be that Content Managed websites implied doom and gloom for search engine inclusion and ranking. With the ‘opening up’ of search engine algorithms, and a little applied technology, CMS delivered web sites then began to avoid exclusion. Now, with experience and a little effort, a CMS can be used to SEO advantage, allowing easily optimised output and rapid response to the (increasingly often) alteration of search engine algorithms.
At a basic level, your CMS should allow the pages to be indexed. There are two common causes for exclusion:
* Query strings – dynamically generated URLs that contain ‘?’ and ‘&’ characters. These can be removed, or replaced, using ‘URL rewriting’ – a technique that involves a web-server plug-in (e.g. mod_rewrite for Apache or IIS_rewrite for IIS), and the conversion of these special characters to a ‘directory’ style URL (e.g. /news/document/latest.html).
* Session IDs. Many modern sites use ‘sessions’ to allow the persistent tracking of a user throughout a site (so that the user remains logged-in, or for user-path analysis, etc.). To allow this ‘persistence’ across multiple pages of a site, the CMS will create a unique number (session id) for the user, and store it in a) a cookie, b) a per-session cookie, or c) the query string (URL) of each internal link. As many users/browsers will not allow cookies, a) and b) are often replaced by c) when the CMS cannot create a cookie for the user. The Google spider, amongst others, will not accept cookies, and the site may therefore include the session id in URLs for the Google spider. As Google needs to uniquely identify each page (so that it doesn’t re-index the same page multiple times), this session id will present Google with different URLs for each visit (a new session is started on each visit), and as Google cannot obtain a single unique URL for each page, it won’t index the site. To prevent this, sessions (or at least URL based session ids) should be switched off for any search-engine-spider visits. Search engine spiders can be detected (and sessions switched off accordingly) by detecting the robot’s identifier in the HTTP headers.
With these basic problems corrected, you can then begin to optimise your code for better rankings. There are a number of common SEO techniques, which are well documented (see other postings on e-consultancy), but I just thought I’d briefly touch on the issue of accessibility with relation to SEO.
Many people consider accessibility to be associated with disabled users. However, a better approach would be to consider accessibility as the name suggests – providing as much ‘access’ to your site/content as possible. Accessibility therefore covers access by search engine spiders, users from other languages and cultures, and users of differing age groups. A number of ‘accessibility techniques’ handily also provide SEO opportunities:
* Ensure links make sense out of context - if a hyperlink is removed from the text, does it still make sense? For example, a number of sites link to ‘more’, which should be replaced with descriptive ‘more news and events’, or similar. Benefit to SEO: some search engines use link text for relevancy.
* Use simple language. The content of your site should be as easy to read as possible, e.g. avoid sector-specific terminology and overly-complex wording. Benefit to SEO: using common, neutral language will open up the content to a wider audience of search terms.
* Validate the HTML (http://validator.w3.org/). To appear in a search engine, your content needs to be indexed by a search engine spider/robot. The spider will attempt to split your content/page into sections before indexing – e.g. header, metadata tags, headings, normal text, etc. In order to split the content into its components, spiders will assume a certain structure – that of valid (X)HTML. If the spider has difficulty in calculating the structure of your code, some of the text could be misclassified or omitted.
Remember – search engine spiders are amongst the most ‘disabled’ users of the web; unable to hear, visualise formatting, or imply information from structure or colour. By designing inline with accessibility standards (http://www.w3.org/TR/WCAG10/), you are designing for Google.
Any other ‘non-typical’ SEO recommendations out there?
Dan
-----------------------
Daniel Zambonini
Amaxus – XML Content Management System
http://www.boxuk.com/amaxus
Director - Focalpoint at Web Development and Marketing
14 March 2004 13:07pm
Hi Daniel, great article! Would it be possible to post the code for the robot session switch?
Tracy
Technical Director at Box UK
15 March 2004 10:22am
Hi Tracy,
No problem. The following code is specifically for PHP, but it should be a similar process to identify robots in JSP, ASP, etc.
It's a pretty simple (though not entirely foolproof) function that detects the spider/robot:
function is_spider()
{
$aSpider_id = array('google', 'crawler', 'spider', 'robot', 'seek',
'scanner', 'slurp', 'scooter', 'harvest');
$sAgent = getenv('HTTP_USER_AGENT');
$bIs_spider = 0;
foreach($aSpider_id as $sSpider_name)
{
if (stristr($sAgent, $sSpider_name))
{
$bIs_spider = 1;
break;
}
}
return $bIs_spider;
}
This funtion can then be wrapped around the start of the session, e.g.
if (! is_spider())
{
session_start();
}
Also note that in PHP, whenever a value of the $_SESSION variable is set, PHP will (by default) automatically start a session. So, this function should also be wrappd around any setting of $_SESSION values, e.g.
if (! is_spider())
{
$_SESSION['username'] = $username;
}
and so on. The above function can also be made a little more optimised for performance by using a 'static' value, that remembers the previous calculated value:
function is_spider()
{
static $bIs_spider;
if (isset($bIs_spider))
{
return $bIs_spider;
}
$aSpider_id = array('google', 'crawler', 'spider', 'robot', 'seek',
'scanner', 'slurp', 'scooter', 'harvest');
$sAgent = getenv('HTTP_USER_AGENT');
$bIs_spider = 0;
foreach($aSpider_id as $sSpider_name)
{
if (stristr($sAgent, $sSpider_name))
{
$bIs_spider = 1;
break;
}
}
return $bIs_spider;
}
Hope that helps,
Thanks,
Dan
P.S. You'll have to re-indent all the code, this forum removes all my nice formatting!
Director - Focalpoint at Web Development and Marketing
15 March 2004 14:34pm
Dan, this is superb, thank you!
I am pretty aware of seo practices but you are the first (that I have seen) to propose creative solutions to these programming issues.
Would it be possible to get your thoughts on steps for converting the urls using the rewrite function you referred to? Just not sure where to get this info? Am very eager to give it a try...
Thank you for all your assistance,
Tracy
Technical Director at Box UK
15 March 2004 14:36pm
Of course - which platform/web server? IIS or Apache, on Windows or Unix?
Director - Focalpoint at Web Development and Marketing
15 March 2004 15:21pm
If it is not too much trouble - both. I'd like to try it and see how it works on both platforms. I really appreciate your assistance. Excited to give it a try! Tracy
Technical Director at Box UK
16 March 2004 13:59pm
Hi Tracy,
Right - I'm going to give you my cut-down version, mainly for IPR reasons, but also because I'm not sure e-consultancy is particularly suited to discussing code (I don't want to waste all the non-techies time on here).
There are four small steps. The following assumes that all content on the site is served through a single PHP script, lets call it 'site.php'. If it's not, it should be relatively simple to make the required changes.
Step 1 - add a .htaccess file into the web root of your site, that has the contents:
<Files site>
ForceType application/x-httpd-php
</Files>
This tells the web server (Apache) that for any request for /site/whatever, to actually use a file in the root called 'site', and to parse it as PHP. For this file to work, you'll have to check that the AllowOverride is set correctly in the httpd.conf server set-up file. This step is for Apache, for IIS it's slightly different - instead you'd install the IIS_Rewrite ISAPI module, and create a configuration option that does basically the same thing (i.e. create a rule that redirects any request from /site/* to site.php).
Step 2 (Apache only). Create a PHP script called 'site' (note - no extension), and in it just include the normal script:
<?
include('site.php');
?>
Save this in your web root.
Step 3 - You'll need to add a function/bit of code to the start of site.php that parses the 'rewritten' URL (/a/bcd/eg/gh) into the normal variables expected by PHP (?a=bcd&eg=gh). This is a big chunk of code, a bit like the following. Note that you'll need to define WEB_ROOT, which will probably be just '/' for most sites:
$req_path = '';
if (isset($_SERVER['HTTP_SCRIPT_URL']))
{
// IIS Rewrite on Windows
$req_path = $_SERVER['HTTP_SCRIPT_URL'];
}
else if (isset($_SERVER['REQUEST_URI']))
{
// Apache
$req_path = $_SERVER['REQUEST_URI'];
}
$fake_path = WEB_ROOT . 'site/';
$additional_path = str_replace($fake_path, '', $req_path);
$query_string = '';
$first_ampersand = strpos($additional_path, '&');
$first_question = strpos($additional_path, '?');
if (strlen($first_ampersand) || strlen($first_question))
{
if (strlen($first_ampersand) && strlen($first_question))
{
$first_querymarker = min($first_ampersand, $first_question);
}
else if (strlen($first_ampersand))
{
$first_querymarker = $first_ampersand;
}
else
{
$first_querymarker = $first_question;
}
$query_string = substr($additional_path, $first_querymarker + 1);
$additional_path = substr($additional_path, 0, $first_querymarker);
}
$aDir = explode('/', $additional_path);
$num_dir = count($aDir);
if ($num_dir > 1)
{
for ($i = 0; $i < $num_dir; $i = $i + 2)
{
if (strlen($aDir[$i]) && (array_key_exists($i+1, $aDir)))
{
$var_name = urldecode($aDir[$i]);
$var_value = urldecode($aDir[$i+1]);
$_GET["{$var_name}"] = $var_value;
$_REQUEST["{$var_name}"] = $var_value;
if (! strlen($query_string))
{
$query_string .= $aDir[$i] . '=' . $aDir[$i+1];
}
else
{
$query_string .= '&' . $aDir[$i] . '=' . $aDir[$i+1];
}
}
}
}
parse_str($query_string, $GET_NEW);
$_GET = array_merge($_GET, $GET_NEW);
$_REQUEST = array_merge($_REQUEST, $GET_NEW);
$_SERVER['QUERY_STRING'] = $query_string;
You may need to modify the code a little, but on the whole, it should be pretty robust. The remainer of your site.php script should then be able to run with all the variables it's expecting.
Step 4. The last stage is to grab all the output (HTML) that the PHP produces, and re-write the (local) URLs contained in it, so that you convert all the & and ? characters in the query string to / delimiters (therefore completing the circle of re-writing). My personal code for this is pretty hefty, so I won't post it here, but you could probably replicate it with a small regular expression, using preg_replace() or something similar in PHP - I'm sure if you search php.net or the web, something suitable will turn up...
Sorry I can't give you the entire solution, but that should get you most of the way there - there are probably similar articles/tutorials on phpbuilder.com, devshed.com or other similar sites.
Thanks,
Dan
Director - Focalpoint at Web Development and Marketing
16 March 2004 14:46pm
Hi Dan,
This is great. Thank you very much. Just wanted a general direction that I could follow. Really appreciate your time and assistance.
Tracy
On 13:59:10 16 March 2004 Dan Zambonini wrote:
>Hi Tracy,
>
>Right - I'm going to give you my cut-down version, mainly
>for IPR reasons, but also because I'm not sure
>e-consultancy is particularly suited to discussing code
>(I don't want to waste all the non-techies time on here).
>
>There are four small steps. The following assumes that
>all content on the site is served through a single PHP
>script, lets call it 'site.php'. If it's not, it should
>be relatively simple to make the required changes.
>
>Step 1 - add a .htaccess file into the web root of your
>site, that has the contents:
>
><Files site>
>ForceType application/x-httpd-php
></Files>
>
>This tells the web server (Apache) that for any request
>for /site/whatever, to actually use a file in the root
>called 'site', and to parse it as PHP. For this file to
>work, you'll have to check that the AllowOverride is set
>correctly in the httpd.conf server set-up file. This step
>is for Apache, for IIS it's slightly different - instead
>you'd install the IIS_Rewrite ISAPI module, and create a
>configuration option that does basically the same thing
>(i.e. create a rule that redirects any request from
>/site/* to site.php).
>
>Step 2 (Apache only). Create a PHP script called 'site'
>(note - no extension), and in it just include the normal
>script:
>
><?
>include('site.php');
>?>
>
>Save this in your web root.
>
>Step 3 - You'll need to add a function/bit of code to the
>start of site.php that parses the 'rewritten' URL
>(/a/bcd/eg/gh) into the normal variables expected by PHP
>(?a=bcd&eg=gh). This is a big chunk of code, a bit
>like the following. Note that you'll need to define
>WEB_ROOT, which will probably be just '/' for most sites:
>
> $req_path = '';
>
> if (isset($_SERVER['HTTP_SCRIPT_URL']))
> {
> // IIS Rewrite on Windows
> $req_path = $_SERVER['HTTP_SCRIPT_URL'];
> }
> else if (isset($_SERVER['REQUEST_URI']))
> {
> // Apache
> $req_path = $_SERVER['REQUEST_URI'];
> }
>
> $fake_path = WEB_ROOT . 'site/';
> $additional_path = str_replace($fake_path, '',
>$req_path);
>
> $query_string = '';
> $first_ampersand = strpos($additional_path,
>'&');
> $first_question = strpos($additional_path, '?');
>
> if (strlen($first_ampersand) ||
>strlen($first_question))
> {
> if (strlen($first_ampersand) &&
>strlen($first_question))
> {
> $first_querymarker = min($first_ampersand,
>$first_question);
> }
> else if (strlen($first_ampersand))
> {
> $first_querymarker = $first_ampersand;
> }
> else
> {
> $first_querymarker = $first_question;
> }
>
> $query_string = substr($additional_path,
>$first_querymarker + 1);
>
>$additional_path = substr($additional_path, 0,
>$first_querymarker);
>
> }
>
> $aDir = explode('/', $additional_path);
> $num_dir = count($aDir);
>
> if ($num_dir > 1)
> {
> for ($i = 0; $i < $num_dir; $i = $i + 2)
> {
> if (strlen($aDir[$i]) &&
>(array_key_exists($i+1, $aDir)))
> {
> $var_name = urldecode($aDir[$i]);
> $var_value = urldecode($aDir[$i+1]);
>
> $_GET["{$var_name}"] =
>$var_value;
> $_REQUEST["{$var_name}"] =
>$var_value;
>
> if (! strlen($query_string))
> {
> $query_string .= $aDir[$i] . '=' .
>$aDir[$i+1];
> }
> else
> {
> $query_string .= '&' .
>$aDir[$i] . '=' . $aDir[$i+1];
> }
> }
> }
> }
>
> parse_str($query_string, $GET_NEW);
> $_GET = array_merge($_GET, $GET_NEW);
> $_REQUEST = array_merge($_REQUEST, $GET_NEW);
>
> $_SERVER['QUERY_STRING'] = $query_string;
>
>You may need to modify the code a little, but on the
>whole, it should be pretty robust. The remainer of your
>site.php script should then be able to run with all the
>variables it's expecting.
>
>Step 4. The last stage is to grab all the output (HTML)
>that the PHP produces, and re-write the (local) URLs
>contained in it, so that you convert all the & and ?
>characters in the query string to / delimiters (therefore
>completing the circle of re-writing). My personal code
>for this is pretty hefty, so I won't post it here, but you
>could probably replicate it with a small regular
>expression, using preg_replace() or something similar in
>PHP - I'm sure if you search php.net or the web, something
>suitable will turn up...
>
>Sorry I can't give you the entire solution, but that
>should get you most of the way there - there are probably
>similar articles/tutorials on phpbuilder.com, devshed.com
>or other similar sites.
>
>Thanks,
>
>Dan