If you’re a publisher, one of the most frustrating experiences is to discover that your content is being scraped by a third party that does not have permission to use your content.
Even more frustrating: when that scraper’s website is able to outrank yours for searches related to your own content.
For obvious reasons then, Google has engaged in a considerable effort to thwart scrapers. And now it’s turning to the public for additional assistance.
Last week, Google’s Matt Cutts put out an RFS (request for scrapers) on his Twitter account:
Scrapers getting you down? Tell us about blog scrapers you see: http://goo.gl/S2hIh We need datapoints for testing.
As Matt McGee of Search Engine Land notes, Google’s Panda 2.2 update, which was released earlier this year, was designed to address scraper sites. But that, not surprisingly, didn’t end the war.
Despite Google’s best efforts, there are still scraper sites that rank well, and sometimes they even rank higher than the site that originally publishing the content.
So what should Google do?
On one hand, it would be curious if Google’s web spam team wasn’t looking at scrapers specifically. But on the other, it’s hard not to make the argument that scrapers aren’t the core problem — they’re just a symptom.
In many cases, scrapers aren’t simply scraping content and hoping that their sites rank well. In the absence of a means to build PageRank legitimately, they also employ black and gray hat techniques that Google has struggled to deal with.
In other words, scraping alone is frustrating, but it shouldn’t be infuriating. What’s really infuriating is that scrapers are often able to take advantage of greater flaws in Google’s ranking algorithms so that scraped content ranks meaningfully at all.
From this perspective, for Google to win the war against scrapers, it must win the war against search engine scamsters. And that won’t be easy.