Kudos to Rand Fishkin and the team at SEOmoz following the announcement that they have spent a year building an index of some 30 billion web pages.

The ambitious year-long experiment should help the team provide more insight into how search engines make sense of the web. It looks pretty interesting from where we’re sitting, and we’re looking forward to playing with it.

Rand says: “For too long, data that is essential to the practice of search engine optimization has been inaccessible to all but a handful of search engineers. Professional SEOs and site owners of all kinds deserve to know more about how their properties are being referenced in such a system.”

There are various caveats about the data due to processing limitations and the differences between SEOmoz’s crawlers and those employed by the major search engines, but the index (the ‘Randex’, anybody?) will nevertheless reveal some interesting secrets.

And here are a few to kickstart your appetite:

  • 58% of all links are to internal pages on the same domain, 42% point to pages off the linking site.
  • The typical web page accrues 32 links (internal and external). So any of your pages with more than 32 links are ‘above average’.
  • 1.83% of all links on the web are nofollowed and of these, 61% are external-pointing, while 39% link to pages on their own site. As such, more than 2 billion links are having their link juice turned off by savvy search marketers.
  • Twice as many web pages employ 302 redirects rather than 301 redirects (read what Matt Cutts says about 301 vs 302).
  • 1.5% of web pages use the meta noindex tag, to tell search engines to avoid reading them.

To help users drill down into the detail SEOmoz has launched ‘Linkscape’, a link research tool that provides advanced link information such as anchor text, number of unique links, number of links from unique domains, and so on.

The tool also uses proprietary metrics such as mozRank and mozTrust (and no prizes for guessing what the thinking was there) to help SEOers figure out the relative value of links from web pages and domains.

Early days, but this looks like another top idea from SEOmoz… go check it out.

Further reading

Search Engine Optimization Best Practice Guide

E-consultancy interviews Rand Fishkin