Warning Headers
By Jonathan Bailey • Feb 1st, 2007 • Category: Articles, Legal Issues, PreventionIf you are not reading this article on plagiarismtoday.com, it is being scraped and the site you are reading this on is guilty of copyright infringement.
The above paragraph is not necessarily true, especially with my Creative Commons license, but many blogs have started putting similar warnings in the leads and footers of their articles in a bid to dissuade scrapers from taking their content. Undoubtedly, if you read many blogs, you’ve seen one or two yourself.
The question though is whether or not it is effective or worthwhile. Though the headers do produce amusing results from time to time, does it do anything to actually stop scraping and protect content.
Sadly, the answer appears to be no, though there may still be some merit to the strategy yet.
The Premise
The idea, as explained by Elaine Vigneault, is simple enough. Include, in your first paragraph, information about your domain and a note that any other site is probably a scraper. Then, when the scraper picks up the article, either in whole or summary, they’ll display the warning alerting any viewers of the scraped site to the infringement.
This, in turn, draws people to the original site and hurts the ability of the spammer to exploit the content.
An alternative to the plan calls for special periodic posts, such as this one on Dating Dames, that calls out scrapers and identifies scraping sites, such posts also often ask for help in identifying such scrapers.
All in all, both seem like a full proof plan, it is an easy, effective and fast way to shame scrapers. However, it rarely works out that way, not because the idea is bad, but because the dynamics of scraping will not allow it to work.
The Problems
Though the idea of putting warning headers and/or warning posts in a feed seems like a good idea, there are several problems with doing it.
- Most Scraping is Automated – Scraping is a “set and forget” operation. Most spammers don’t even look at their own sites once they are set up. They will never know that they have been shamed or take any action to rectify it. In short, the scraping doesn’t stop.
- Headers Not Always Grabbed – Most keyword scraping is done by using feeds from sites like Technorati and Google Blog Search. Those feeds only pull a few sentences of your content and only the few around the desired keyword. That, more often than not, is from the middle of the post and will almost never include a warning header. Furthermore, any whole posts dedicated to the issue will be ignored by this method.
- Not Meant for Human Eyes – The vast majority of scraped sites are not intended for human visitors, but to build up search engine reputation for other sites that appear more legitimate. If almost no people see them, then the warning does no good.
- Warnings Are Ignored – Most of the time, it’s easy to tell that a scraper site isn’t legitimate. The amateurish layout and overly dense keywords are instant clues. Only once, out of dozens of warnings I’ve seen, was I surprised that the scraper was not an original site. Even then though, most people don’t take the time to look up at their address bar. Furthermore, some scrapers use confusing addresses, such as subdomains, that can make it appear legit even with the warning.
- Negative Impact on Viewers – These warnings can cause a great deal of confusion to viewers reading in their personal RSS reader and, if on the site itself, can impact visitors there as well. It makes people distrust the site and, if they are using an online RSS reader, causes some to feel that they are violating copyright just by reading the feed.
But while this method of protecting content has many different flaws, it also has some merit. There is at least one way that warning messages can help protect your content, though not in the way many think.
One Advantage
Though warnings may not stop scraping or even effectively warn users, they can defend against one of the fears that comes with scraping, search engine penalties.
In short, many Webmasters understandably fear that scrapers, by publishing duplicate content to the Web, will either push their pages down in the rankings or even replace their own.
By including the domain and a link to the original site, search engines are better able to distinguish between the copy and the original, making it easier for them to penalize the infringing party.
With that in mind, such warning paragraphs become a quick and easy way to guard, not against user confusion or against scraping, but the search engine penalties that may come along with it.
Conclusions
However, to gain that benefit, there is no need to place a stern warning or message in the header or footer of the article. Instead, one is generally better served by linking to their own site throughout the article, including old entries, and adding a simple byline in the first or last paragraphs in the entries.
They may not be picked up, but they are more likely to be seen and noticed than any warning and they have far fewer repercussions.
It’s a simple, easy and even productive step that you can take to minimize the impact of content theft. In addition to helping the search engines detect scraping, it can also aid your visitors in finding old, bust still useful, entries and increases the overall SEO of your site.
The bottom line is that, while placing warnings in the headers of posts may be a flawed strategy, the concept of tagging work as your own is a sound one. The secret though is in how you do it.
A smart strategy helps everyone while increasing the protection, a bad one protects nothing but risks turning off legitimate readers.
The choice, when it is all said and done, is simple.
Tags: Content Theft, Copyright, Copyright Infringement, Copyright Law, Creative Commons, Plagiarism, RSS, Scraping, Search Engine, SEO, Spam, Splogging, Splogs
|
|
Protect Your Work. Subscribe to Plagiarism Today via Email or RSS. |
Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

Thanks for the call out on Dating Dames. The one thing I made sure I did prior to making that particular post was to report them through their Google Ads. Hopefully all these efforts combined will help
Thanks again for all your efforts
Gayla
Gayla,
Just a heads up on this one, to get Google Adsense to respond to your request, you most likely will need to file a DMCA complaint with them. It’s a pain, especially since it has to be faxed or mailed in and is legally dubious, and I’ve written about it before here:
http://www.plagiarismtoday.com/2006/08/31/adsen...
Give it a look if you’re interested and, if you need any help, don’t hesitate to send me an email.
Good luck with your scraper!
Thanks for the call out on Dating Dames. The one thing I made sure I did prior to making that particular post was to report them through their Google Ads. Hopefully all these efforts combined will help
Thanks again for all your efforts
Gayla
Gayla,
Just a heads up on this one, to get Google Adsense to respond to your request, you most likely will need to file a DMCA complaint with them. It’s a pain, especially since it has to be faxed or mailed in and is legally dubious, and I’ve written about it before here:
http://www.plagiarismtoday.com/2006/08/31/adsense-and-the-dmca/
Give it a look if you’re interested and, if you need any help, don’t hesitate to send me an email.
Good luck with your scraper!
Worthwhile read. Thanks for writing on this topic.
I’ve heard of linking back to yourself periodically for this and other reasons. And I’d heard that duplicate content lowers your search engine rank.
What I hadn’t thought about was how readers might be turned off by reading a warning. I tend to write whatever I want without thought to who my readers are and what they want to read. Obviously I’ll have to try a little harder if I want to increase and maintain a large readership.
Thanks again for this insightful article.
Worthwhile read. Thanks for writing on this topic.
I’ve heard of linking back to yourself periodically for this and other reasons. And I’d heard that duplicate content lowers your search engine rank.
What I hadn’t thought about was how readers might be turned off by reading a warning. I tend to write whatever I want without thought to who my readers are and what they want to read. Obviously I’ll have to try a little harder if I want to increase and maintain a large readership.
Thanks again for this insightful article.
I’m thinking of turning off full post feeds because of this.
I’m thinking of turning off full post feeds because of this.
I fear the problem is much more basic. Maybe these strategies do not work as well as they should, because most of the readers plagiarize, themselves! This is a question of values, and I get the sense that more and more people simply do not see anything wrong with plagiarism. Until we fix *that* problem, we are basically taking water out of the boat with a bucket to keep the Titanic from sinking.
I fear the problem is much more basic. Maybe these strategies do not work as well as they should, because most of the readers plagiarize, themselves! This is a question of values, and I get the sense that more and more people simply do not see anything wrong with plagiarism. Until we fix *that* problem, we are basically taking water out of the boat with a bucket to keep the Titanic from sinking.
Spam Blogs Study shows the largest concentration of Blog Spam comes from Mountain View, CA…
Hmm, Steve Rubel mentioned New Stats on Spam in the Blogosphere by the eBuity Group that places the nexus of Blog Spam in Mountain View, Washington DC and San Francisco."… Most spam blogs are still hosted in the US. We ranked……