Google Addresses Duplicate Content

Jonathan BaileyDecember 21, 2006

3 minutes read

SEO gurus often take a great deal of interest in plagiarism and content theft issues out of fear of being hit with the duplicate content penalty. Under such a penalty, Google penalizes sites with identical information by reducing their search ranking and their exposure.

However, like most things dealing with Google’s search rankings, the duplicate content penalty has been surrounded in mystery. If it exists, what exactly constitutes duplicate content and how the penalty works are all questions that have gone largely unanswered.

However, earlier this week, Google posted an entry on their Official Google Webmaster Blog dealing with the issue of duplicate content and it provided some interesting, if contradictory, answers to the puzzle.

What’s Clear

The post deals with many different aspects of the duplicate content penalty and answers several questions outright. Some of the highlights include.

The use of 301 referrals is a good idea.
Quotes and citations will not be considered duplicate content.
It is a good idea to prevent Google from indexing print-friendly pages or other duplicate pages.
Translated works are not considered duplicate content
In the vast majority of cases, Google simply filters out content it feels to be duplicate rather than penalizing sites rankings. That is reserved for cases where Google believes intentional manipulation.

Though this is all great news, there are no major surprises. Most of this has been known or at least assumed for some time. What’s more interesting is when Google tries to delve into the issue of duplicate content and scraping.

In their attempt to make things more clear, they actually made a pair of seemingly contradictory statements that might just add more fuel to the fire.

What’s Less Clear

In one of the last paragraphs. Google directly addresses the issue of scraping and spam blogging. It says:

Don’t worry be happy: Don’t fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it’s highly unlikely that such sites can negatively impact your site’s presence in Google. If you do spot a case that’s particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.

Though that definitely seems like great news and puts a smile on my face since it might mean a sharp reduction in the amount of scraping. However, a few lines up had a similarly contradictory statement.

Syndicate carefully: If you syndicate your content on other sites, make sure they include a link back to the original article on each syndicated article. Even with that, note that we’ll always show the (unblocked) version we think is most appropriate for users in each given search, which may or may not be the version you’d prefer.

This seems odd because spam blogs are basically illegal syndicate sites that do not include links to the original work. It seems odd that, on one hand, Google would claim that it can distinguish between originals and copies, making it “unlikely” that such sites will hurt your ranking but saying that legal syndication sites without links might.

However, even if the “Don’t Worry” statement is taken at face value, it isn’t likely to do much for people who make their livelihoods off of Google’s rankings. “Highly unlikely” doesn’t mean that scrapers won’t hurt your rankings, just that they probably won’t.

If there is any chance of a serious problem, most will act to prevent it, even if the odds are relatively small.

Conclusions

Personally, I’ve found that SEO is just one reason to worry about content theft. Creating a unique experience and long-term maintenance of your rights are equally if not more important reasons to deal with scrapers swiftly.

Still, SEO is an important reason to look at dealing with scrapers and, though Google’s attempt to calm concerns was definitely welcome, its mixed signals have failed to really put my mind much at ease.

The problem is that if scraping truly did not work, no one would be doing it. It has to be working for someone or it wouldn’t be so popular and an issue for so many with RSS feeds.

Despite that, I take great relief that Google is aware of the problem, addressing it and has a system in place to thwart most of such attempts. Hopefully, in the long run, that will discourage scraping and promote original content, healthy syndication and all around better content use.

It might be a fantasy, but it’s a hope that I still share.

Tags: Content Theft, Copyright, Copyright Infringement, Copyright Law, DMCA, Google, Scraping, Search Engine, SEO, Splogging, Splogs

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free

Jonathan BaileyDecember 21, 2006

3 minutes read

Want to Reuse or Republish this Content?

Follow us