Like so many others, you’ve decided to revisit your business model and paid content looks awfully good at the moment. Running an online subscription service can be very rewarding, but it’s tough.

One of the challenges posed by a paywall is the paywall’s impact on SEO. Since content is restricted to subscribers, Google can’t spider your content. What can you do about this?

Disclaimer: the technique described here is cloaking and when it comes to cloaking vis-à-vis Google, you should educate yourself on the subject before deciding whether or not it’s worth implementing. The purpose of this tutorial is not to advocate cloaking, but to explain how it can be used with paid content since paywalls are (once again) becoming popular with website owners and they can have a significant (negative) impact on SEO.

While there are plenty of sites, some high-profile, that employ this technique without penalty from Google, others have, over the years, reported being dropped completely from Google for what’s described here so you’ll want to think long and hard about this and weigh the risks against the potential benefits. Caveat emptor.

The PHP Function

Detecting Googlebot’s visits to your website is quite simple. Here’s the PHP function I use:

function is_google() {

    if( eregi(“Googlebot“, $_SERVER[‘HTTP_USER_AGENT’]) ) {
        $hostname = gethostbyaddr($_SERVER[‘REMOTE_ADDR’]);
        if( eregi(“googlebot.com”, $hostname) )
            return true;
        else
            return false;
       
    } else {
        return false;

    }

}

What it does:

1. Checks to see if the user-agent is Googlebot by looking for “Googlebot” in the user-agent.
2. Checks to see if the hostname the IP address resolves to has “googlebot.com” in it.

If a visitor meets both criteria, the function returns true; if a visitor fails both criteria, the function returns false.

To integrate this with your existing code, simply call the is_google() function at the appropriate point in your logic. For instance, let’s assume you have the following code that displays paid content to a logged-in subscriber and a login/registration form to users who aren’t logged in and/or aren’t subscribers:

if( is_subscriber() && is_logged() ) {
    // show subscriber content
} else {
    // show login/registration form
}

To treat Google like a logged-in subscriber, change the first line to:

if( ( is_subscriber() && is_logged() ) || is_google() ) {

That’s it. When the Googlebot pays you a visit, it will be able to access all of your juicy subscriber content.

Note: this isn’t fool-proof. The “Googlebot” user-agent can easily be spoofed and by setting up a hostname with “googlebot.com” in it (eg. googlebot.com.somedomain.com), someone could trick you into serving up your content. For a more secure solution, you should use a regular expression to make sure that googlebot.com is the top-level domain (I’ve skipped the use of regular expressions to do this in the interest of keeping the example here as simple as possible).

Giving Non-Subscribers a Sample

Of course, the purpose of this technique is not to deceive users. It’s poor form to let Google spider your subscriber content so that it gets indexed but to then show the users Google refers to you a subscription/login page.
Therefore, if you choose to implement this technique, you should be courteous enough to provide a snippet of the paid content so that the user can quickly determine if you’ve got what he or she was looking for. The Wall Street Journal, for instance, does this:

Not only are these types of snippets the decent thing to provide, they will boost your conversions.

There are a number of ways to generate the snippet. There are plenty of functions out there that you can use for taking text and chopping it down to x words or you could simply use PHP’s substr function to do the job.

Depending on your goals and how your site is set up, you may only want to show the snippet to users who have been referred to you through Google. Using the code above as an example:

if( is_subscriber() && is_logged() ) {
    // show subscriber content
} else {
    if( eregi(“google”, $_SERVER[‘HTTP_REFERER’]) ) {
        // show login/registration form *with* snippet
    } else {
        // show login/registration form without snippet
    }
}

Making Sure Your Subscriber Content Doesn’t Get Cached

You’re almost there. There’s one last thing left to do: make sure Google doesn’t cache your subscriber content, which would quickly defeat the purpose of this entire exercise. To keep Google from making your subscriber content available through its cache, add the following meta tag between the HTML

tags on pages that contain subscriber content:

Conclusion

Again, this is a grey hat SEO technique that, while used successfully by quite a few sites, still carries risk. That said, given the resurgence of paid content as a business model and the negative effects a pay wall can have on SEO, it may be worth a look depending on your risk tolerance.

It’s my hope that Google will eventually offer a means for operators of subscription websites to have their content indexed (and marked as such) as it does with Google News, which would eliminate the need for owners of subscription websites to turn to cloaking.

Photo credit: mythwhisper via Flickr.