<?xml version="1.0" encoding="UTF-8"?>
<blog-post>
  <author-id type="integer">42244</author-id>
  <blog-comments-count type="integer">1</blog-comments-count>
  <blog-post-status-id type="integer">3</blog-post-status-id>
  <body-format>econsultancy_xml</body-format>
  <body-formatted>
  &lt;p&gt;Duplicate content refers to blocks of text on the same site, or across domains, which are either exactly the same or very similar. &lt;/p&gt;
  &lt;p&gt;
    &lt;a href="http://googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html"&gt;According to Lasnik,&lt;/a&gt;&#160;most&#160;instances&#160;are &lt;em&gt;"unintentional or at least not malicious in origin", &lt;/em&gt;but in others, content is&#160;copied across domains in a bid to manipulate search ratings and generate more traffic.&lt;/p&gt;
  &lt;p&gt;The reason &lt;a href="http://www.google.co.uk/"&gt;Google&lt;/a&gt;&#160;focuses on the issue is, basically, that it wants to provide better search results - users want to see a range of distinct pages when they search for something. &lt;/p&gt;
  &lt;p&gt;To provide this, search engines have to contend with spammers using invisible or irrelevant content,&#160;scraper sites that&#160;use other sites' content to&#160;obtain AdSense clicks, and a range of other rubbish. &lt;/p&gt;
  &lt;p&gt;
    &lt;strong&gt;Duplicate content penalties&lt;br /&gt;&lt;/strong&gt;According to Lasnik, duplicate content is normally dealt with by filtering adjustments - if your site has regular and printer versions of a webpage, then Google will list just one version of the page. &lt;/p&gt;
  &lt;p&gt;When the duplicate content is clearly an attempt to manipulate rankings, Google says it will ...&lt;/p&gt;
  &lt;blockquote&gt;
    &lt;p&gt;
      &lt;em&gt;... "make appropriate adjustments in the indexing and ranking of the sites involved."&lt;/em&gt;
    &lt;/p&gt;
  &lt;/blockquote&gt;
  &lt;p&gt;
    &lt;strong&gt;
      &lt;em&gt;Here are some of&#160;Lasnik's&#160;tips for webmasters:&lt;/em&gt;
    &lt;/strong&gt;
  &lt;/p&gt;
  &lt;ul&gt;
    &lt;li&gt;
      &lt;strong&gt;Block the version you don't want Google to index&#160;-&lt;/strong&gt;
      &lt;strong&gt;&#160;&lt;/strong&gt;disallow those directories or make use of regular expressions in your robots.txt file. &lt;/li&gt;
    &lt;li&gt;
      &lt;strong&gt;Take care when syndicating content&lt;/strong&gt; - make sure other sites that use your content include a link back to the original source. Google will always show the version it thinks is most relevant to a given search.&#160;&lt;/li&gt;
    &lt;li&gt;
      &lt;strong&gt;Use TLDs&lt;/strong&gt; - adopt top level domains for country-specific content.&lt;/li&gt;
    &lt;li&gt;
      &lt;strong&gt;Use smaller boilerplates&lt;/strong&gt; -&#160;minimise boilerplate repetition by including smaller summaries of copyright text on the bottom of pages, and linking to one page with more details.&lt;/li&gt;
    &lt;li&gt;
      &lt;strong&gt;Don't worry about scraper sites&lt;/strong&gt; - Lasnik says that webmasters shouldn't be too concerned about these sites, as they are &lt;em&gt;"highly unlikely"&lt;/em&gt; to adversely affect your rankings. &lt;br /&gt;&lt;/li&gt;
  &lt;/ul&gt;
  &lt;p&gt;
    &lt;em&gt;Also see our &lt;/em&gt;
    &lt;a href="http://econsultancy.com/reports/search-engine-optimization-seo-best-practice-guide-2007"&gt;
      &lt;em&gt;Search Engine Optimisation (SEO) - Best Practice Guide&lt;/em&gt;
    &lt;/a&gt;
    &lt;em&gt;&#160;for more info on&#160;the issue.&lt;/em&gt;&#160;&lt;/p&gt;
</body-formatted>
  <body-unformatted>&lt;FormattedContent xmlns="http://www.e-consultancy.com/schema/formattedContent/"&gt;
  &lt;Paragraph&gt;Duplicate content refers to blocks of text on the same site, or across domains, which are either exactly the same or very similar. &lt;/Paragraph&gt;
  &lt;Paragraph&gt;
    &lt;Link URL="http://googlewebmastercentral.blogspot.com/2006/12/deftly-dealing-with-duplicate-content.html" Window="New"&gt;According to Lasnik,&lt;/Link&gt;&#160;most&#160;instances&#160;are &lt;Quote&gt;"unintentional or at least not malicious in origin", &lt;/Quote&gt;but in others, content is&#160;copied across domains in a bid to manipulate search ratings and generate more traffic.&lt;/Paragraph&gt;
  &lt;Paragraph&gt;The reason &lt;Link URL="http://www.google.co.uk/" Window="New"&gt;Google&lt;/Link&gt;&#160;focuses on the issue is, basically, that it wants to provide better search results - users want to see a range of distinct pages when they search for something. &lt;/Paragraph&gt;
  &lt;Paragraph&gt;To provide this, search engines have to contend with spammers using invisible or irrelevant content,&#160;scraper sites that&#160;use other sites' content to&#160;obtain AdSense clicks, and a range of other rubbish. &lt;/Paragraph&gt;
  &lt;Paragraph&gt;
    &lt;Emphasis&gt;Duplicate content penalties&lt;LineBreak /&gt;&lt;/Emphasis&gt;According to Lasnik, duplicate content is normally dealt with by filtering adjustments - if your site has regular and printer versions of a webpage, then Google will list just one version of the page. &lt;/Paragraph&gt;
  &lt;Paragraph&gt;When the duplicate content is clearly an attempt to manipulate rankings, Google says it will ...&lt;/Paragraph&gt;
  &lt;Block&gt;
    &lt;Paragraph&gt;
      &lt;Quote&gt;... "make appropriate adjustments in the indexing and ranking of the sites involved."&lt;/Quote&gt;
    &lt;/Paragraph&gt;
  &lt;/Block&gt;
  &lt;Paragraph&gt;
    &lt;Emphasis&gt;
      &lt;Quote&gt;Here are some of&#160;Lasnik's&#160;tips for webmasters:&lt;/Quote&gt;
    &lt;/Emphasis&gt;
  &lt;/Paragraph&gt;
  &lt;List Type="Disc"&gt;
    &lt;ListItem&gt;
      &lt;Emphasis&gt;Block the version you don't want Google to index&#160;-&lt;/Emphasis&gt;
      &lt;Emphasis&gt;&#160;&lt;/Emphasis&gt;disallow those directories or make use of regular expressions in your robots.txt file. &lt;/ListItem&gt;
    &lt;ListItem&gt;
      &lt;Emphasis&gt;Take care when syndicating content&lt;/Emphasis&gt; - make sure other sites that use your content include a link back to the original source. Google will always show the version it thinks is most relevant to a given search.&#160;&lt;/ListItem&gt;
    &lt;ListItem&gt;
      &lt;Emphasis&gt;Use TLDs&lt;/Emphasis&gt; - adopt top level domains for country-specific content.&lt;/ListItem&gt;
    &lt;ListItem&gt;
      &lt;Emphasis&gt;Use smaller boilerplates&lt;/Emphasis&gt; -&#160;minimise boilerplate repetition by including smaller summaries of copyright text on the bottom of pages, and linking to one page with more details.&lt;/ListItem&gt;
    &lt;ListItem&gt;
      &lt;Emphasis&gt;Don't worry about scraper sites&lt;/Emphasis&gt; - Lasnik says that webmasters shouldn't be too concerned about these sites, as they are &lt;Quote&gt;"highly unlikely"&lt;/Quote&gt; to adversely affect your rankings. &lt;LineBreak /&gt;&lt;/ListItem&gt;
  &lt;/List&gt;
  &lt;Paragraph&gt;
    &lt;Quote&gt;Also see our &lt;/Quote&gt;
    &lt;Link URL="http://econsultancy.com/reports/search-engine-optimization-seo-best-practice-guide-2007" Window="New"&gt;
      &lt;Quote&gt;Search Engine Optimisation (SEO) - Best Practice Guide&lt;/Quote&gt;
    &lt;/Link&gt;
    &lt;Quote&gt;&#160;for more info on&#160;the issue.&lt;/Quote&gt;&#160;&lt;/Paragraph&gt;
&lt;/FormattedContent&gt;</body-unformatted>
  <created-at type="datetime">2007-06-22T08:03:00+01:00</created-at>
  <enabled-blog-comments-count type="integer">1</enabled-blog-comments-count>
  <expertise-level-id type="integer">1</expertise-level-id>
  <extract-format>econsultancy_xml</extract-format>
  <extract-formatted>
  &lt;p&gt;
    &lt;strong&gt;The issue of duplicate content is a thorny one that can affect sites' search rankings, and one that has even &lt;a href="/blog/634-google-in-duplicate-content-shocker"&gt;caught&lt;/a&gt; Google out in the past.&lt;/strong&gt;
  &lt;/p&gt;
  &lt;p&gt;So it's good that the search giant's Adam Lasnik has written a post that throws some light on&#160;how it deals with the problem.&lt;/p&gt;
</extract-formatted>
  <extract-unformatted>&lt;FormattedContent xmlns="http://www.e-consultancy.com/schema/formattedContent/"&gt;
  &lt;Paragraph&gt;
    &lt;Emphasis&gt;The issue of duplicate content is a thorny one that can affect sites' search rankings, and one that has even &lt;Link URL="/blog/634-google-in-duplicate-content-shocker" Window="New"&gt;caught&lt;/Link&gt; Google out in the past.&lt;/Emphasis&gt;
  &lt;/Paragraph&gt;
  &lt;Paragraph&gt;So it's good that the search giant's Adam Lasnik has written a post that throws some light on&#160;how it deals with the problem.&lt;/Paragraph&gt;
&lt;/FormattedContent&gt;</extract-unformatted>
  <featured type="boolean">false</featured>
  <id type="integer">1390</id>
  <learn-more-formatted nil="true"></learn-more-formatted>
  <learn-more-unformatted nil="true"></learn-more-unformatted>
  <legacy-article-id type="integer">363629</legacy-article-id>
  <name>Google on duplicate content</name>
  <private type="boolean">false</private>
  <published-at type="datetime">2007-06-22T15:51:00+01:00</published-at>
  <slug>google-on-duplicate-content</slug>
  <tweetbacks-updated-at type="datetime">2009-04-28T22:52:15+01:00</tweetbacks-updated-at>
  <unpublished-at type="datetime" nil="true"></unpublished-at>
  <updated-at type="datetime">2009-04-28T22:52:15+01:00</updated-at>
  <views-count type="integer">262</views-count>
</blog-post>
