Showing posts 11 - 17 of 17
  1. Ashley Friedlein Staff

    CEO at Econsultancy

    08 August 2007 13:13pm

    Ashley Friedlein

    I understand it as the former but probably limiting the number of pages anyone could view to 1 before they were prompted to log in. So, yes, someone could see the whole thing for free but would they really be bothered to try and do that?

    Equally I guess we could cookie them so that even if they did come back through another search within a certain time frame they couldn't access the content without logging in. So they could delete their cookies for every page (or reject them) and go via search to cobble together the full report. But frankly there are much easier ways to steal our content than that... e.g. via illegal file swapping.

    Ashley

  2. Ashley Friedlein Staff

    CEO at Econsultancy

    08 August 2007 14:13pm

    Ashley Friedlein

    On the "cloaking" thing there seems to be a small, but important, difference to serving different 'content' versus serving different 'access'. It is wrong to serve different content to Google vs. users but apparently not wrong to serve different levels of content access?

    As per Adam's link below to Google News' guidelines on allowing access to subscription content, they recommend:

    "The easiest way to do this (allow Google to spider restricted content but not allow users to do so) is to configure your webservers to not serve the registration page to our crawlers (when the User-Agent is "Googlebot")"

    To the uneducated this sounds to me very like 'serving something different to Googlebot than to users'? (i.e. cloaking).

    But is this only true for Google News and not the main index? Teddie's mainstream examples further down this thread seem to make it clear that this also applies to the main index. In fact, Teddie's answer seems to me to sum up the best way to do things currently.

    Though we're now scratching our heads about the best combination of deterrents (reffering strings, user agents, cookies, sessions etc.) to actually uniquely identify a user and try and prevent as much content theft as possible whilst not over-complicating things...

    Ashley

  3. Tom Stuart Staff

    Chief Architect at Econsultancy

    17 August 2007 10:16am

    Tom Stuart

    Yes, this is definitely along the lines of what we'd ideally like to do.

    The relevant Google News page (http://www.google.com/support/newspub/bin/answer.py?answer=40543) strongly suggests that they actually don't mind you selectively allowing the Googlebot through a pay wall, which from a publisher's perspective provides the best of both worlds (we get our content indexed while it remains fully protected), but clearly that's not going to create a spectacular user experience for anyone clicking through from a search result.

    However, I'm concerned about the technical logistics of "First Click Free" -- namely how trivial it is to circumvent it. Anyone with a rudimentary grasp of IT can install a browser extension to spoof their HTTP referer, or disable cookies, or even (with a bit more effort) make consecutive HTTP requests appear to come from different IP addresses, and faced with that combination of techniques it becomes impossible to distinguish between someone who's genuinely just landed from a search result and someone who's done 30 seconds of browser configuration and is now trawling the site downloading all the content.

    There's always the old argument that publishers can do no better than provide a mild deterrent, and that anyone who's determined enough to crack the system is essentially welcome to invest the time and effort necessary to do so, but here we're talking about a really small effort versus a really big payoff. It's one thing for, say, the Washington Post to allow First Click Free on the grounds that their content is essentially ephemeral and largely advertising-funded in the first place, but E-consultancy's content has a high and lasting value that we can't afford to compromise so readily.

    Does anyone have any technical insights about how to safely release useful chunks of content to users arriving from search engine results, without also making it very easy for any half-competent user to subvert this system to his advantage? The fundamentally anonymous and stateless nature of the web makes this a very difficult problem to solve reliably. We are extremely keen to provide the richest possible user experience, but preferably not at the expense of our content assets!

  4. Dave Chaffey Silver

    Digital Marketing Consultant, Trainer, Author and Speaker at SmartInsights.com

    22 August 2007 08:57am

    Dave Chaffey

    I've come late to this party - holidays! and can't add much from an SEO POV, but I think there are some other alternatives to consider.

    What’s not really covered is the impact on conversion rate. My gut feel is that of the two models – #1 partial content preview (the 50 words one) and #2 time-limited preview (the Webmasterworld one) #1 would work best for conversion. It is also not subject to tech savvy people getting around the security on #2.

    On the other hand, I must say I have always liked the Webmasterworld model since all their content is indexed so shows up frequently in long tail searches – and option #2 will therefore be much better for awareness / reach.

    So, balancing higher conversion against reach I think option #2 will work best.

    Which ever option you chose, but especially #2 it obviously needs to be flexible enough to revert if the engines introduce a new approach / rule on cloaking or a tag for subscription content.

    The other aspect not really mentioned and this is probably option #3 - is that for all your reports you have a 3 or 4 layer hierarchy, essentially chapter:topic:sub-topic. So you could have a mechanism of restricting access at the chapter/topic level – first 50 words maybe, but make all the detailed content sub-topic available, and so exploit the tail, but the full picture – horizontal navigation - isn’t available.

    HTH Dave Chaffey
    www.davechaffey.com

    On 16:02:04 7 August 2007 Ashley wrote:

    I should know this but I'm intrigued to hear from any SEO experts out there what the latest thinking / best practice is for allowing search engines to index restricted content e.g. content that sits behind a pay-access log in, or other barrier?

    We're looking at converting all our current file content (e.g. Word files, PDFs etc.), which are mostly paid-access only research and guides, into XHTML so that we can display them as HTML or allow users to convert them (e.g. to PDF) on the fly. This will also make it easier to syndicate our content, present it on other devices, "reskin" the presentation layer and so on.

    But it also would allow us to make all the contents of a report (e.g. a 200 page Word file) available for indexing by a search engine. In theory this is good because there is a lot of great, niche, content in these documents which would be great for long tail SEO and attracting high-converting traffic.

    But, of course, we wouldn't want the user to actually get access to the full content itself without first paying. Nor would we want all this pay-access content existing in Google's cache.

    I guess, in theory, we could allow the Googlebot and chosen other spiders to index this content as HTML but not allow real humans or other agents to do so. I believe we might be able to use the robots.txt protocol to prevent caching too?

    But in the case of the above we are showing Google something that we are not showing our users - and isn't this cloaking?

    However, Google's Book search seems to work in just this way so Google don't appear to be averse to indexing intellectual property / content in this way but without revealing it all?

    Any thoughts / pointers / experiences welcome...

    Thanks

    Ashley Friedlein
    CEO
    E-consultancy.com

  5. Jim O

    Social media guy at CSC

    22 August 2007 09:50am

    Jim O

    Quote: So you could have a mechanism of restricting access at the chapter/topic level – first 50 words maybe, but make all the detailed content sub-topic available, and so exploit the tail


    Are you suggesting building, for want of a better term, a 'meta directory' that sits above the full content? That could actually be a very useful tool if integrated with a site map and would certainly help search prominence. You could continue to disallow access to deeper content, and really optimise the content of the directory summaries.

    On 08:57:28 22 August 2007 DaveChaffey wrote:

    I've come late to this party - holidays! and can't add much from an SEO POV, but I think there are some other alternatives to consider.

    What’s not really covered is the impact on conversion rate. My gut feel is that of the two models – #1 partial content preview (the 50 words one) and #2 time-limited preview (the Webmasterworld one) #1 would work best for conversion. It is also not subject to tech savvy people getting around the security on #2.

    On the other hand, I must say I have always liked the Webmasterworld model since all their content is indexed so shows up frequently in long tail searches – and option #2 will therefore be much better for awareness / reach.

    So, balancing higher conversion against reach I think option #2 will work best.

    Which ever option you chose, but especially #2 it obviously needs to be flexible enough to revert if the engines introduce a new approach / rule on cloaking or a tag for subscription content.

    The other aspect not really mentioned and this is probably option #3 - is that for all your reports you have a 3 or 4 layer hierarchy, essentially chapter:topic:sub-topic. So you could have a mechanism of restricting access at the chapter/topic level – first 50 words maybe, but make all the detailed content sub-topic available, and so exploit the tail, but the full picture – horizontal navigation - isn’t available.

    HTH Dave Chaffey
    www.davechaffey.com

    On 16:02:04 7 August 2007 Ashley wrote:

    I should know this but I'm intrigued to hear from any SEO experts out there what the latest thinking / best practice is for allowing search engines to index restricted content e.g. content that sits behind a pay-access log in, or other barrier?

    We're looking at converting all our current file content (e.g. Word files, PDFs etc.), which are mostly paid-access only research and guides, into XHTML so that we can display them as HTML or allow users to convert them (e.g. to PDF) on the fly. This will also make it easier to syndicate our content, present it on other devices, "reskin" the presentation layer and so on.

    But it also would allow us to make all the contents of a report (e.g. a 200 page Word file) available for indexing by a search engine. In theory this is good because there is a lot of great, niche, content in these documents which would be great for long tail SEO and attracting high-converting traffic.

    But, of course, we wouldn't want the user to actually get access to the full content itself without first paying. Nor would we want all this pay-access content existing in Google's cache.

    I guess, in theory, we could allow the Googlebot and chosen other spiders to index this content as HTML but not allow real humans or other agents to do so. I believe we might be able to use the robots.txt protocol to prevent caching too?

    But in the case of the above we are showing Google something that we are not showing our users - and isn't this cloaking?

    However, Google's Book search seems to work in just this way so Google don't appear to be averse to indexing intellectual property / content in this way but without revealing it all?

    Any thoughts / pointers / experiences welcome...

    Thanks

    Ashley Friedlein
    CEO
    E-consultancy.com

  6. Ashley Friedlein Staff

    CEO at Econsultancy

    23 August 2007 10:40am

    Ashley Friedlein

    Dave's option #3 is indeed an interesting concept and not one that I've seen done anywhere yet. In theory it holds out the opportunity to get the best of both worlds.

    I don't think we'd need a "meta directory" as such, nor would we necessarily have to stick to a hierarchical "visibility" rule based on chapter:topic:sub-topic though that might be the most sensible way to do it in most cases.

    I'd imagine we'd just have a series of tags in our XHTML which marked up the content to say whether it is the type of content which should show only the first 50 words or whether it should all show up under the 'first click free' idea. The nice thing about this is that should we need to change our approach (as Dave points out because of a change in the way the search engines work) then we'd just change our tags (metadata) and the rest would be easy.

    Ashley

  7. Dave Chaffey Silver

    Digital Marketing Consultant, Trainer, Author and Speaker at SmartInsights.com

    24 August 2007 07:42am

    Dave Chaffey

    I wasn't exactly thinking directory, but yes it would make sense to have a multi-level site map related for this deep content for easier nav and internal linking purposes - good idea.

    Dave

Reply to this thread

Log in to reply to this thread or join Econsultancy for free so you can post to our forums along with other benefits.