Google Addresses Duplicate Content

By Jonathan Bailey • Dec 21st, 2006 • Category: Articles, DMCA, Legal Issues, Prevention

SEO gurus often take a great deal of interest in plagiarism and content theft issues out of fear of being hit with the duplicate content penalty. Under such a penalty, Google penalizes sites with identical information by reducing their search ranking and their exposure.

However, like most things dealing with Google’s search rankings, the duplicate content penalty has been surrounded in mystery. If it exists, what exactly constitutes duplicate content and how the penalty works are all questions that have gone largely unanswered.

However, earlier this week, Google posted an entry on their Official Google Webmaster Blog dealing with the issue of duplicate content and it provided some interesting, if contradictory, answers to the puzzle.

What’s Clear

The post deals with many different aspects of the duplicate content penalty and answers several questions outright. Some of the highlights include.

  • The use of 301 referrals is a good idea.
  • Quotes and citations will not be considered duplicate content.
  • It is a good idea to prevent Google from indexing print-friendly pages or other duplicate pages.
  • Translated works are not considered duplicate content
  • In the vast majority of cases, Google simply filters out content it feels to be duplicate rather than penalizing sites rankings. That is reserved for cases where Google believes intentional manipulation.

Though this is all great news, there are no major surprises. Most of this has been known or at least assumed for some time. What’s more interesting is when Google tries to delve into the issue of duplicate content and scraping.

In their attempt to make things more clear, they actually made a pair of seemingly contradictory statements that might just add more fuel to the fire.

What’s Less Clear

In one of the last paragraphs. Google directly addresses the issue of scraping and spam blogging. It says:

Don’t worry be happy: Don’t fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it’s highly unlikely that such sites can negatively impact your site’s presence in Google. If you do spot a case that’s particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.

Though that definitely seems like great news and puts a smile on my face since it might mean a sharp reduction in the amount of scraping. However, a few lines up had a similarly contradictory statement.

Syndicate carefully: If you syndicate your content on other sites, make sure they include a link back to the original article on each syndicated article. Even with that, note that we’ll always show the (unblocked) version we think is most appropriate for users in each given search, which may or may not be the version you’d prefer.

This seems odd because spam blogs are basically illegal syndicate sites that do not include links to the original work. It seems odd that, on one hand, Google would claim that it can distinguish between originals and copies, making it “unlikely” that such sites will hurt your ranking but saying that legal syndication sites without links might.

However, even if the “Don’t Worry” statement is taken at face value, it isn’t likely to do much for people who make their livelihoods off of Google’s rankings. “Highly unlikely” doesn’t mean that scrapers won’t hurt your rankings, just that they probably won’t.

If there is any chance of a serious problem, most will act to prevent it, even if the odds are relatively small.

Conclusions

Personally, I’ve found that SEO is just one reason to worry about content theft. Creating a unique experience and long-term maintenance of your rights are equally if not more important reasons to deal with scrapers swiftly.

Still, SEO is an important reason to look at dealing with scrapers and, though Google’s attempt to calm concerns was definitely welcome, its mixed signals have failed to really put my mind much at ease.

The problem is that if scraping truly did not work, no one would be doing it. It has to be working for someone or it wouldn’t be so popular and an issue for so many with RSS feeds.

Despite that, I take great relief that Google is aware of the problem, addressing it and has a system in place to thwart most of such attempts. Hopefully, in the long run, that will discourage scraping and promote original content, healthy syndication and all around better content use.

It might be a fantasy, but it’s a hope that I still share.

Tags: , , , , , , , , , ,

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

7 Responses »

  1. Congratulations!

    I saw your site listed on 9rules

  2. [...] I would say that it hasn’t made it to the web in a major way given the ineffectiveness of automated translation tools, but Google’s hint about translated text not being duplicate content may indicate a growing problem in this field. [...]

  3. [...] Though warnings may not stop scraping or even effectively warn users, they can defend against one of the fears that comes with scraping, search engine penalties. [...]

  4. [...] you talk a great game about being able to detect duplicate content, but yet junk content is still getting through and it [...]

  5. [...] Google has addressed that and said that it should not be a major concern, I have worked with several businesses that have had [...]

  6. [...] nsfw) as it pertains to scraping, one penalty is certain, increased competition. Even if there is no algorithmic “penalty” placed on your site, the plagiarists will still show up for in your keyword results. For example, if you had a keyword [...]

  7. [...] please consider subscribing to my RSS feed. Thank you for visiting!It appears that Google’s push to handle duplicate content may be having an unintended side [...]

Leave a Reply