Warning Headers

Jonathan BaileyFebruary 1, 2007

4 minutes read

If you are not reading this article on plagiarismtoday.com, it is being scraped and the site you are reading this on is guilty of copyright infringement.

The above paragraph is not necessarily true, especially with my Creative Commons license, but many blogs have started putting similar warnings in the leads and footers of their articles in a bid to dissuade scrapers from taking their content. Undoubtedly, if you read many blogs, you’ve seen one or two yourself.

The question though is whether or not it is effective or worthwhile. Though the headers do produce amusing results from time to time, does it do anything to actually stop scraping and protect content.

Sadly, the answer appears to be no, though there may still be some merit to the strategy yet.

The Premise

The idea, as explained by Elaine Vigneault, is simple enough. Include, in your first paragraph, information about your domain and a note that any other site is probably a scraper. Then, when the scraper picks up the article, either in whole or summary, they’ll display the warning alerting any viewers of the scraped site to the infringement.

This, in turn, draws people to the original site and hurts the ability of the spammer to exploit the content.

An alternative to the plan calls for special periodic posts, such as this one on Dating Dames, that calls out scrapers and identifies scraping sites, such posts also often ask for help in identifying such scrapers.

All in all, both seem like a full proof plan, it is an easy, effective and fast way to shame scrapers. However, it rarely works out that way, not because the idea is bad, but because the dynamics of scraping will not allow it to work.

The Problems

Though the idea of putting warning headers and/or warning posts in a feed seems like a good idea, there are several problems with doing it.

Most Scraping is Automated – Scraping is a “set and forget” operation. Most spammers don’t even look at their own sites once they are set up. They will never know that they have been shamed or take any action to rectify it. In short, the scraping doesn’t stop.
Headers Not Always Grabbed – Most keyword scraping is done by using feeds from sites like Technorati and Google Blog Search. Those feeds only pull a few sentences of your content and only the few around the desired keyword. That, more often than not, is from the middle of the post and will almost never include a warning header. Furthermore, any whole posts dedicated to the issue will be ignored by this method.
Not Meant for Human Eyes – The vast majority of scraped sites are not intended for human visitors, but to build up search engine reputation for other sites that appear more legitimate. If almost no people see them, then the warning does no good.
Warnings Are Ignored – Most of the time, it’s easy to tell that a scraper site isn’t legitimate. The amateurish layout and overly dense keywords are instant clues. Only once, out of dozens of warnings I’ve seen, was I surprised that the scraper was not an original site. Even then though, most people don’t take the time to look up at their address bar. Furthermore, some scrapers use confusing addresses, such as subdomains, that can make it appear legit even with the warning.
Negative Impact on Viewers – These warnings can cause a great deal of confusion to viewers reading in their personal RSS reader and, if on the site itself, can impact visitors there as well. It makes people distrust the site and, if they are using an online RSS reader, causes some to feel that they are violating copyright just by reading the feed.

But while this method of protecting content has many different flaws, it also has some merit. There is at least one way that warning messages can help protect your content, though not in the way many think.

One Advantage

Though warnings may not stop scraping or even effectively warn users, they can defend against one of the fears that comes with scraping, search engine penalties.

In short, many Webmasters understandably fear that scrapers, by publishing duplicate content to the Web, will either push their pages down in the rankings or even replace their own.

By including the domain and a link to the original site, search engines are better able to distinguish between the copy and the original, making it easier for them to penalize the infringing party.

With that in mind, such warning paragraphs become a quick and easy way to guard, not against user confusion or against scraping, but the search engine penalties that may come along with it.

Conclusions

However, to gain that benefit, there is no need to place a stern warning or message in the header or footer of the article. Instead, one is generally better served by linking to their own site throughout the article, including old entries, and adding a simple byline in the first or last paragraphs in the entries.

They may not be picked up, but they are more likely to be seen and noticed than any warning and they have far fewer repercussions.

It’s a simple, easy and even productive step that you can take to minimize the impact of content theft. In addition to helping the search engines detect scraping, it can also aid your visitors in finding old, bust still useful, entries and increases the overall SEO of your site.

The bottom line is that, while placing warnings in the headers of posts may be a flawed strategy, the concept of tagging work as your own is a sound one. The secret though is in how you do it.

A smart strategy helps everyone while increasing the protection, a bad one protects nothing but risks turning off legitimate readers.

The choice, when it is all said and done, is simple.

Tags: Content Theft, Copyright, Copyright Infringement, Copyright Law, Creative Commons, Plagiarism, RSS, Scraping, Search Engine, SEO, Spam, Splogging, Splogs