Google continues to fight scrapers, turns to public for help

If you're a publisher, one of the most frustrating experiences is to discover that your content is being scraped by a third party that does not have permission to use your content.

Even more frustrating: when that scraper's website is able to outrank yours for searches related to your own content.

For obvious reasons then, Google has engaged in a considerable effort to thwart scrapers. And now it's turning to the public for additional assistance.

Last week, Google's Matt Cutts put out an RFS (request for scrapers) on his Twitter account:

Scrapers getting you down? Tell us about blog scrapers you see: http://goo.gl/S2hIh We need datapoints for testing.

As Matt McGee of Search Engine Land notes, Google's Panda 2.2 update, which was released earlier this year, was designed to address scraper sites. But that, not surprisingly, didn't end the war.

Despite Google's best efforts, there are still scraper sites that rank well, and sometimes they even rank higher than the site that originally publishing the content.

So what should Google do?

On one hand, it would be curious if Google's web spam team wasn't looking at scrapers specifically. But on the other, it's hard not to make the argument that scrapers aren't the core problem -- they're just a symptom.

In many cases, scrapers aren't simply scraping content and hoping that their sites rank well. In the absence of a means to build PageRank legitimately, they also employ black and gray hat techniques that Google has struggled to deal with.

In other words, scraping alone is frustrating, but it shouldn't be infuriating. What's really infuriating is that scrapers are often able to take advantage of greater flaws in Google's ranking algorithms so that scraped content ranks meaningfully at all.

From this perspective, for Google to win the war against scrapers, it must win the war against search engine scamsters. And that won't be easy.

Patricio Robles is a tech reporter at Econsultancy. Follow him on Twitter.

Add your own

Reader comments (3)

  1. Avatar-blank-50x50 Nick Stamoulis

    3:27PM on 30th August 2011

    Seeing those scrapers succeed is usually the reason site owners decide to go black hat in the first place. "It works for them and they get away with it, so why should I bother to stay white hat?" I can understand that frustration because it constantly feels like you're fighting an uphill battle.

  2. Avatar-blank-50x50 Matthew Read

    4:20PM on 30th August 2011

    I recently published a piece of content online and the next day 6 other sites had the same article up! What annoys me is that they then shove it full of Google Ads and get paid!

    The Panda update did hit a lot of people but so many avoided the drop with simple changes and it will be interesting to see if Google release a 3rd Panda update to combat them further.

  3. Max Webster-Dowsing Max Webster-Dowsing

    SEO Consultant at RBS Insurance

    9:33AM on 31st August 2011

    Anyone who engages in SEO us never strictly white-hat, after we are after links which we have to create by our own means. All that will happen is that people who engage in scraping content - they will just spin the content i.e. replace the synonyms for each phrase in the article, which in turn makes this unique content in the eyes of the search engines. People with half a brain do not use directly scraped content if they want good rankings, scraped content will only work well on a site and rank well if the site is an aged site with some page-rank and unique content so it is mixed up a bit. Unfortunately there will always be holes in Google's algorithm.

Log in to post a comment