Cloaking to Stop Scraping

Cloaking is generally thought of as a black hat SEO tactic that involves tricking the search engine into believing that a page has one thing on it when, in fact, visitors will see something else, often completley unrelated. 

However, cloaking actually much more broad than that. Though the black hat use is the best known, it can also be used to offer different content to different browsers to ensure compatibility, different sized images to different screen resolutions or even display content in different languages based on country. Any time a page automatically dispalys different content to one person or one group of people than it does another, that is considered cloaking.

White hats, recently, found another positive use for cloaking, the ability to stop scraping by providing different content to a scraper than to the rest of the world. This has proved detrimental to one splogger and has earned one hacker his fifteen minutes of fame.

Best of all, the hacker in question showing the world how to repeat his trick, including offering the code to enable any WordPress user to fight back in much the same way.

The Ballad of RSnake

RSnake is a blogger at ha.ckers.org , an Internet security site that has been in operation since May.  Last week, he discovered that a scraper was stealing his content but, rather than filing a DMCA notice or even contacting the scrapers registrar, he decided to research his plagiarist and collected a great deal of personal information about him. Then, using a bit of coding, modified his RSS feed to return a different page when the scraper returned, causing him to scrape a lengthy paragraph of personal information including name, address and more. 

The plagiarist caught the change quickly and shut down the offending site while offering an apology for his misdeeds.

The response to RSnake's technique and the resulting shut down has been overwhelmingly positive. While many have been worried about the posting of such personal information (RSnake has since removed the information from his site), most have agreed that the idea of "cloaking" an RSS feed to thwart spammers is a very exciting.

This has prompted RSnake, as well as others, to offer up code to enable others to do the same with their own blog.

Needless to say, this could grow to be a powerful tool to help many, especially those with their own servers and blogging software, combat RSS scraping.

How it Was Done (WordPress)

The first step to cloaking content from a scraper is finding the IP address of the server doing the scraping. That can be tricky to do, especially if one isn't very comfortable with networking tools and terminology. but is usually just a matter of looking at the server that the scraped content appears on. Since most scrapers run their software on the same server that they publish their spam blogs on, the IP that is pulling the content is likely to be the same or very close to it. DomainTools.com can help you translate a site address into an IP address, greatly speeding up the process. 

Even with that information, you will likely need to search your server logs, which are available with most paid hosting accounts, to find the IP address of the person that is scraping your feed. Just look for an IP address that is close or identical to the server itself and check and see if the times roughly add up to when new posts appeard on the plagiarists' site.

Once you are relatively certain you have your plagiarist, all you need to do is insert the following code into your WordPress wp-rss2.php template (Note: This method works only in WordPress, we'll discuss other systems in a moment).

Special thanks to RSnake for sharing this code with all of us! 

First, look for the following lines:

<?php the_category_rss() ?>
<guid isPermaLink="false"><?php the_guid(); ?></guid> 

Then, after that, add the following lines, replacing the Xs with the IP address of the scraper, the Ys with the fake descriptions and the Zs with fake content:

<?php if ($_SERVER['REMOTE_ADDR'] == "XXX.XXX.XXX.XXX") : ?>
<description><![CDATA[YYYYY]]></description>
<content:encoded><![CDATA[ZZZZZ]]></content:encoded> <?php else : ?>

Finally, nine more lines down from that, close the "if" loop with the following tag:

<?php endif; ?>

If done correctly, it should forward the scraper to a fake RSS feed with whatever content you specified. To test it, out, place your IP address in the field and visit your feed. If you get the fake content, everything is working according to plan. 

As far as the fake content goes, a blank feed would produce the least amount of strain on bandwidth and server resources but it can contain whatever you desire, including a general broadcast saying something to the effect of, "If you are reading this, the site you are at is a scraper and is attempting to use my content illegally."

In that regard, one would be following in the footsteps of what visual artists have been doing for years to protect their images from being hotlinked and plagiarized. 

Other Systems 

While I am looking for similar code to work in other popular blogging platforms, such as MT, Textpattern, etc., one blogger, who, oddly enough, focuses on black hat SEO techniques, has devised a way to perform a similar cloak on any site that is hosted on server that has an editable .htaccess file (Note: This generally only includes paid hosting accounts).

Once one has discovered the IP address of their scraper and has created a page filled with fake content, perhaps a generated RSS feed filled with fake content, all they have to do is place these three lines in their .htaccess file:

RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^XXX.XXX.XXX.XXX
RewriteRule ^(.*)$ http://newfeedurl.com/feed

Once again, the Xs represent the IP address of the scraper. However, this time, http://newfeedurl.com/feed represents the address of the fake content page or feed.

Needless to say, this method is only for individuals that have access to their .htaccess file and are comfortable editing it. If you need information about .htaccess, you can find an excellent guide on it here.

Limitations

There are several obvious limitations to this method. First, they can't be used, yet, by any free accounts such as Blogger, Myspace or Xanga. Though all of these sites produce RSS feeds, none offer the template access or the server access required to make this kind of redirection work (Note: It may be possible to do this with a free WordPress.com account, I do not know.)

Second, they can not be used by anyone taking advantage of Feedburner's service, at least not yet. Though Feedburner does an excellent job of protecting a feed by detecting and reporting "uncommon uses" of it, Feedburner does not offer a way to prevent certain individuals from accessing it. However, this may be a feature that will come out at a later date.

Also, these techniques are only for people that are comfortable wtih discovering a scraper's IP address and applying code either to a PHP or a .htaccess file in order to create the redirection. While those who are familiar with the technology and have used it for some time will have no problem with this, those who are new to running a site or haven't dabbled in these areas might be intimidated.

Finally, there's nothing to stop the scraper from simply moving his operations to a new IP address or a new domain altogether. While the same is true for DMCA notices and other other plagiarism cessation techniques, those usually result in the closure of the plagiarist's account, costing him money and time. With this, it is trivial to move his operations to another server that he or she has already prepared.

Still, it is a powerful and immediate way to stop a scraper. If nothing else, it might be a good stop gap measure to prevent further abuse while waiting for other countermeasures to actually close the site. It can also be very effective in international cases and situations where the host is extremely uncooperative.

Clearly, for those capable of using it, it is a tool to consider.

Conclusions

In the end, the effectiveness of this tool may be limited by one's technical prowess, how powerful their set up is and the nature of the scrapers that abuse their work. It may not be perfect for every person in every situation, but in the situations where it does work, it works almost perfectly. 

If nothing else, it is important to be aware of this weapon and consider it as a potential tool for legitimate bloggers to protect their content. Not only is it strangely fitting to turn a black hat SEO technique against the people that practice it, but it is also powerful, immediate and surgical in nature. In short, it is effective, quick and harms no one other than the plagiarist.

In that regard, it is many times better than many of the current available techniques out there and, while it isn't a replacement for more traditional routes, it can be a useful tool to stop and prevent RSS scraping.

It is an extra weapon in a war where our options are, for the most part, severely limited.  

[tags]Plagiarism, Content Theft, Scraping, Cloaking, Black Hat SEO, Splogging, Splogs[/tags] 

40 comments
Brett Wraight
Brett Wraight

Check out these guys http://www.scrapestopper.com They are able to stop any type of scrape attack I do not know how they do it but they stop all forms of scraping…They have a trial period Awesome and there system is so easy worth a look at..

Brett Wraight
Brett Wraight

Check out these guys http://www.scrapestopper.com They are able to stop any type of scrape attack I do not know how they do it but they stop all forms of scraping…They have a trial period Awesome and there system is so easy worth a look at..

simple
simple

locating a scraper using server IP address, when it is being run from a shared hosting, will be quite tricky. more than one users on the same IP address, at best you will find that IP belongs to hosting provider and no point publishing "personal details" of hosting provider.

In my case, running a wpmu site, I got struck by scraping attack from Russian IP. I concluded it is a 'scraping' based on amount data they downloaded from my site in a short span of time (less than 12 hours). Data downloaded was like 500mb, closed to 12000 http request. that practically get my site almost inaccessible during that period. when I tried reverse DNS, not much information other than that IP belongs to a russian ISP.

so my best remedy was just to block that IP range (95.108). Any way, blogs on my site are mostly in English, I doubt a russian will be reading it or registering a blog.

I wonder how google (blogger.com) is dealing with this kind of problem.

simple
simple

locating a scraper using server IP address, when it is being run from a shared hosting, will be quite tricky. more than one users on the same IP address, at best you will find that IP belongs to hosting provider and no point publishing "personal details" of hosting provider.
In my case, running a wpmu site, I got struck by scraping attack from Russian IP. I concluded it is a 'scraping' based on amount data they downloaded from my site in a short span of time (less than 12 hours). Data downloaded was like 500mb, closed to 12000 http request. that practically get my site almost inaccessible during that period. when I tried reverse DNS, not much information other than that IP belongs to a russian ISP.
so my best remedy was just to block that IP range (95.108). Any way, blogs on my site are mostly in English, I doubt a russian will be reading it or registering a blog.
I wonder how google (blogger.com) is dealing with this kind of problem.

simple
simple

locating a scraper using server IP address, when it is being run from a shared hosting, will be quite tricky. more than one users on the same IP address, at best you will find that IP belongs to hosting provider and no point publishing "personal details" of hosting provider. In my case, running a wpmu site, I got struck by scraping attack from Russian IP. I concluded it is a 'scraping' based on amount data they downloaded from my site in a short span of time (less than 12 hours). Data downloaded was like 500mb, closed to 12000 http request. that practically get my site almost inaccessible during that period. when I tried reverse DNS, not much information other than that IP belongs to a russian ISP. so my best remedy was just to block that IP range (95.108). Any way, blogs on my site are mostly in English, I doubt a russian will be reading it or registering a blog. I wonder how google (blogger.com) is dealing with this kind of problem.

sham
sham

Awesome!!!

sham
sham

Awesome!!!

sham
sham

Awesome!!!

Mike Baptiste
Mike Baptiste

This is a neat concept. The ones that really floor me are these clowns who create a blog based on a google search term where they grab any post hit by the given term. Even in quantity I can't imagine how this gets them any reasonable revenue. The sites are horrid looking and often badly formatted, usually with just Google Adsense. Where would their traffic come from?

Plus you have to wonder how you'd try to get them taken down since they usually link back to the original article and don't always use the whole thing. I've never worried too much about it but it sure makes your incoming links look weird.

Mike Baptiste
Mike Baptiste

This is a neat concept. The ones that really floor me are these clowns who create a blog based on a google search term where they grab any post hit by the given term. Even in quantity I can't imagine how this gets them any reasonable revenue. The sites are horrid looking and often badly formatted, usually with just Google Adsense. Where would their traffic come from?

Plus you have to wonder how you'd try to get them taken down since they usually link back to the original article and don't always use the whole thing. I've never worried too much about it but it sure makes your incoming links look weird.

Mike Baptiste
Mike Baptiste

This is a neat concept. The ones that really floor me are these clowns who create a blog based on a google search term where they grab any post hit by the given term. Even in quantity I can't imagine how this gets them any reasonable revenue. The sites are horrid looking and often badly formatted, usually with just Google Adsense. Where would their traffic come from?Plus you have to wonder how you'd try to get them taken down since they usually link back to the original article and don't always use the whole thing. I've never worried too much about it but it sure makes your incoming links look weird.

JB
JB

Hung,

Thanks for the link! I greatly appreciate it. It's a good read and anyone that is interested in this article needs to take a look at it.

Somber One,

You don't put the code directly into your feed, but rather, in the template for your feed. It only works with Wordpress and, apparently, not with free wordpress.com accounts. If you have your own install of Wordpress, which I don't believe you have on your site, you can edit the templates.

I'm working on getting other code for different formats.

Ja,

Yes, there are plenty of ways to get around it, fortunately sploggers are a "set and forget" crowd that likely won't bother. It's just easier to move on.

I can honestly say I don't know much about Microformats. I'm going to take some time to look at it today and I'll see about doing a writeup either Friday or Monday. I'll look at your site a little bit later.

Thanks for the tip!

Merideth,

Thanks for the information, it's disappointing, but not unexpected. I'll edit the article in a second to reflect that.

Oskar,

There are weaknesses, but it is very unlikely that you'll be on the same proxy as the splogger. THe reason is that splogger software runs on the server itself usually, the same as the site. Odds are you won't have your home connection on the same proxy as a Web site.

There are plenty of weaknesses, but I don't see this as a common problem. Let me know if I'm wrong though, there might be something I'm not seeing.

JB
JB

Hung,

Thanks for the link! I greatly appreciate it. It's a good read and anyone that is interested in this article needs to take a look at it.

Somber One,

You don't put the code directly into your feed, but rather, in the template for your feed. It only works with Wordpress and, apparently, not with free wordpress.com accounts. If you have your own install of Wordpress, which I don't believe you have on your site, you can edit the templates.

I'm working on getting other code for different formats.

Ja,

Yes, there are plenty of ways to get around it, fortunately sploggers are a "set and forget" crowd that likely won't bother. It's just easier to move on.

I can honestly say I don't know much about Microformats. I'm going to take some time to look at it today and I'll see about doing a writeup either Friday or Monday. I'll look at your site a little bit later.

Thanks for the tip!

Merideth,

Thanks for the information, it's disappointing, but not unexpected. I'll edit the article in a second to reflect that.

Oskar,

There are weaknesses, but it is very unlikely that you'll be on the same proxy as the splogger. THe reason is that splogger software runs on the server itself usually, the same as the site. Odds are you won't have your home connection on the same proxy as a Web site.

There are plenty of weaknesses, but I don't see this as a common problem. Let me know if I'm wrong though, there might be something I'm not seeing.

JB
JB

Hung,
Thanks for the link! I greatly appreciate it. It's a good read and anyone that is interested in this article needs to take a look at it.
Somber One,
You don't put the code directly into your feed, but rather, in the template for your feed. It only works with Wordpress and, apparently, not with free wordpress.com accounts. If you have your own install of Wordpress, which I don't believe you have on your site, you can edit the templates.
I'm working on getting other code for different formats.
Ja,
Yes, there are plenty of ways to get around it, fortunately sploggers are a "set and forget" crowd that likely won't bother. It's just easier to move on.
I can honestly say I don't know much about Microformats. I'm going to take some time to look at it today and I'll see about doing a writeup either Friday or Monday. I'll look at your site a little bit later.
Thanks for the tip!
Merideth,
Thanks for the information, it's disappointing, but not unexpected. I'll edit the article in a second to reflect that.
Oskar,
There are weaknesses, but it is very unlikely that you'll be on the same proxy as the splogger. THe reason is that splogger software runs on the server itself usually, the same as the site. Odds are you won't have your home connection on the same proxy as a Web site.
There are plenty of weaknesses, but I don't see this as a common problem. Let me know if I'm wrong though, there might be something I'm not seeing.

JB
JB

Hung, Thanks for the link! I greatly appreciate it. It's a good read and anyone that is interested in this article needs to take a look at it. Somber One, You don't put the code directly into your feed, but rather, in the template for your feed. It only works with Wordpress and, apparently, not with free wordpress.com accounts. If you have your own install of Wordpress, which I don't believe you have on your site, you can edit the templates. I'm working on getting other code for different formats. Ja, Yes, there are plenty of ways to get around it, fortunately sploggers are a "set and forget" crowd that likely won't bother. It's just easier to move on. I can honestly say I don't know much about Microformats. I'm going to take some time to look at it today and I'll see about doing a writeup either Friday or Monday. I'll look at your site a little bit later. Thanks for the tip! Merideth, Thanks for the information, it's disappointing, but not unexpected. I'll edit the article in a second to reflect that. Oskar, There are weaknesses, but it is very unlikely that you'll be on the same proxy as the splogger. THe reason is that splogger software runs on the server itself usually, the same as the site. Odds are you won't have your home connection on the same proxy as a Web site. There are plenty of weaknesses, but I don't see this as a common problem. Let me know if I'm wrong though, there might be something I'm not seeing.

JB
JB

Hung,Thanks for the link! I greatly appreciate it. It's a good read and anyone that is interested in this article needs to take a look at it.Somber One,You don't put the code directly into your feed, but rather, in the template for your feed. It only works with Wordpress and, apparently, not with free wordpress.com accounts. If you have your own install of Wordpress, which I don't believe you have on your site, you can edit the templates. I'm working on getting other code for different formats.Ja,Yes, there are plenty of ways to get around it, fortunately sploggers are a "set and forget" crowd that likely won't bother. It's just easier to move on.I can honestly say I don't know much about Microformats. I'm going to take some time to look at it today and I'll see about doing a writeup either Friday or Monday. I'll look at your site a little bit later.Thanks for the tip!Merideth,Thanks for the information, it's disappointing, but not unexpected. I'll edit the article in a second to reflect that.Oskar,There are weaknesses, but it is very unlikely that you'll be on the same proxy as the splogger. THe reason is that splogger software runs on the server itself usually, the same as the site. Odds are you won't have your home connection on the same proxy as a Web site.There are plenty of weaknesses, but I don't see this as a common problem. Let me know if I'm wrong though, there might be something I'm not seeing.

Mike Baptiste
Mike Baptiste

This is a neat concept. The ones that really floor me are these clowns who create a blog based on a google search term where they grab any post hit by the given term. Even in quantity I can't imagine how this gets them any reasonable revenue. The sites are horrid looking and often badly formatted, usually with just Google Adsense. Where would their traffic come from?

Plus you have to wonder how you'd try to get them taken down since they usually link back to the original article and don't always use the whole thing. I've never worried too much about it but it sure makes your incoming links look weird.

Mike Baptiste
Mike Baptiste

This is a neat concept. The ones that really floor me are these clowns who create a blog based on a google search term where they grab any post hit by the given term. Even in quantity I can't imagine how this gets them any reasonable revenue. The sites are horrid looking and often badly formatted, usually with just Google Adsense. Where would their traffic come from?
Plus you have to wonder how you'd try to get them taken down since they usually link back to the original article and don't always use the whole thing. I've never worried too much about it but it sure makes your incoming links look weird.

Mike Baptiste
Mike Baptiste

This is a neat concept. The ones that really floor me are these clowns who create a blog based on a google search term where they grab any post hit by the given term. Even in quantity I can't imagine how this gets them any reasonable revenue. The sites are horrid looking and often badly formatted, usually with just Google Adsense. Where would their traffic come from? Plus you have to wonder how you'd try to get them taken down since they usually link back to the original article and don't always use the whole thing. I've never worried too much about it but it sure makes your incoming links look weird.

Oskar Syahbana
Oskar Syahbana

However there's weakness in this method. How if I am currently on the same proxy server as the spammers? Wouldn't that mean that I won't be able to receive the correct RSS?

Oskar Syahbana
Oskar Syahbana

However there's weakness in this method. How if I am currently on the same proxy server as the spammers? Wouldn't that mean that I won't be able to receive the correct RSS?

Oskar Syahbana
Oskar Syahbana

However there's weakness in this method. How if I am currently on the same proxy server as the spammers? Wouldn't that mean that I won't be able to receive the correct RSS?

Meredith
Meredith

Wordpress.com users don't seem to have anywhere to put that code.

Meredith
Meredith

Wordpress.com users don't seem to have anywhere to put that code.

Meredith
Meredith

Wordpress.com users don't seem to have anywhere to put that code.

Ja
Ja

Excellent writeup! Plenty of easy ways to get around ip-specific stuff like this, but at least it's likely to catch a bunch of unsuspecting sploggers that employ the methods used here.

I've been meaning to ask, what are your thoughts on micoformats, particularly hreview, making it so simple to reproduce and misuse (intentionally or not) another's data along with a laundry list of other aggregating and indexing issues for hreviews?

If you have no idea what I'm talking about don't worry (yet). I've done some writeups about some of the issues on my blog if you care to take a peek. I think it would interest you.

JÄ?

Ja
Ja

Excellent writeup! Plenty of easy ways to get around ip-specific stuff like this, but at least it's likely to catch a bunch of unsuspecting sploggers that employ the methods used here.

I've been meaning to ask, what are your thoughts on micoformats, particularly hreview, making it so simple to reproduce and misuse (intentionally or not) another's data along with a laundry list of other aggregating and indexing issues for hreviews?

If you have no idea what I'm talking about don't worry (yet). I've done some writeups about some of the issues on my blog if you care to take a peek. I think it would interest you.

JÄ?

Ja
Ja

Excellent writeup! Plenty of easy ways to get around ip-specific stuff like this, but at least it's likely to catch a bunch of unsuspecting sploggers that employ the methods used here.

I've been meaning to ask, what are your thoughts on micoformats, particularly hreview, making it so simple to reproduce and misuse (intentionally or not) another's data along with a laundry list of other aggregating and indexing issues for hreviews?

If you have no idea what I'm talking about don't worry (yet). I've done some writeups about some of the issues on my blog if you care to take a peek. I think it would interest you.

J�?

Ja
Ja

Excellent writeup! Plenty of easy ways to get around ip-specific stuff like this, but at least it's likely to catch a bunch of unsuspecting sploggers that employ the methods used here. I've been meaning to ask, what are your thoughts on micoformats, particularly hreview, making it so simple to reproduce and misuse (intentionally or not) another's data along with a laundry list of other aggregating and indexing issues for hreviews? If you have no idea what I'm talking about don't worry (yet). I've done some writeups about some of the issues on my blog if you care to take a peek. I think it would interest you. J

Ja
Ja

Excellent writeup! Plenty of easy ways to get around ip-specific stuff like this, but at least it's likely to catch a bunch of unsuspecting sploggers that employ the methods used here.I've been meaning to ask, what are your thoughts on micoformats, particularly hreview, making it so simple to reproduce and misuse (intentionally or not) another's data along with a laundry list of other aggregating and indexing issues for hreviews?If you have no idea what I'm talking about don't worry (yet). I've done some writeups about some of the issues on my blog if you care to take a peek. I think it would interest you.JÄ?

Ja
Ja

Excellent writeup! Plenty of easy ways to get around ip-specific stuff like this, but at least it's likely to catch a bunch of unsuspecting sploggers that employ the methods used here.
I've been meaning to ask, what are your thoughts on micoformats, particularly hreview, making it so simple to reproduce and misuse (intentionally or not) another's data along with a laundry list of other aggregating and indexing issues for hreviews?
If you have no idea what I'm talking about don't worry (yet). I've done some writeups about some of the issues on my blog if you care to take a peek. I think it would interest you.
J

The Somber One
The Somber One

Those lines that you gave, where do I put them, in the RSS feed?

The Somber One
The Somber One

Those lines that you gave, where do I put them, in the RSS feed?

The Somber One
The Somber One

Those lines that you gave, where do I put them, in the RSS feed?

Hung Truong
Hung Truong

I also found that someone was ripping info from my site, and did something similar with .htaccess. I made up a fake RSS feed which had 1000 entries of nonsense. Then I redirected the spammer's IP to go to that fake RSS. You can read about it at my personal blog.

While it's easy to do for one offender, it might get messy when so many splogs spring up daily.

Hung Truong
Hung Truong

I also found that someone was ripping info from my site, and did something similar with .htaccess. I made up a fake RSS feed which had 1000 entries of nonsense. Then I redirected the spammer's IP to go to that fake RSS. You can read about it at my personal blog.

While it's easy to do for one offender, it might get messy when so many splogs spring up daily.

Hung Truong
Hung Truong

I also found that someone was ripping info from my site, and did something similar with .htaccess. I made up a fake RSS feed which had 1000 entries of nonsense. Then I redirected the spammer's IP to go to that fake RSS. You can read about it at my personal blog.While it's easy to do for one offender, it might get messy when so many splogs spring up daily.

Hung Truong
Hung Truong

I also found that someone was ripping info from my site, and did something similar with .htaccess. I made up a fake RSS feed which had 1000 entries of nonsense. Then I redirected the spammer's IP to go to that fake RSS. You can read about it at my personal blog.

While it's easy to do for one offender, it might get messy when so many splogs spring up daily.

Hung Truong
Hung Truong

I also found that someone was ripping info from my site, and did something similar with .htaccess. I made up a fake RSS feed which had 1000 entries of nonsense. Then I redirected the spammer's IP to go to that fake RSS. You can read about it at my personal blog.
While it's easy to do for one offender, it might get messy when so many splogs spring up daily.

Hung Truong
Hung Truong

I also found that someone was ripping info from my site, and did something similar with .htaccess. I made up a fake RSS feed which had 1000 entries of nonsense. Then I redirected the spammer's IP to go to that fake RSS. You can read about it at my personal blog. While it's easy to do for one offender, it might get messy when so many splogs spring up daily.

Trackbacks

  1. [...] Cloaking to Stop Scraping How to cloak your blog content from known content theives. This is a little extreme but it did have very good results in this case.   Related Posts from the Past: [...]

  2. [...] Petard. Own. Hoist. Caught this tidbit from the WordPress news headlines – how to defeat sploggers, blackhat SEOs, and other kinds of content thieves by feeding them their own special RSS feed. If they’re going to use RSS to steal content, feed ‘em crap, like, oh, their own WHOIS data or George Carlin’s words you can’t say on television (or whatever). [...]

  3. [...] Here is a way to use cloaking to get back a little sumthin’ from those who scrape your content. [...]

  4. [...] Plagiarism Today has an excellent article for blog owners on preventing spam sites from using your RSS feed in order to steal your content. [...]

  5. [...] Cloaking to Stop Scraping – a positive use for cloaking stop scraping by providing different content to a scraper than to others… If done correctly, it forwards the scraper to a fake feed with whatever content you specified: “If you are reading this, the site you are at is a scraper and is attempting to use (tags: Scraping) [...]

  6. [...] Webmasters, especially of larger sites, have grown accustomed to these scraping applications and have taken actions against them including cloaking content, banning IPs, truncating RSS feeds and making their templates harder to scrape. Scrapers, on the other hand, have improved their software to improve their effectiveness, creating a game of cat and mouse that has become all-too-familiar on the Web. [...]

  7. [...] Reconozco que copié el sistema de esta web en inglés, pero lo voy a explicar en español y con mis palabras. [...]

  8. [...] Plagiarism Today es un blog que se describe como “el blog de un frustrado con la plaga de plagio online” con algunos ejemplos de cosas interesantes (ej: cloaking para frenar el scraping) o notas sobre el tema pero ¿puede estropear el desarrollo de la blogosfera el plagio? Digo.. ¿está tan extendido? [...]

  9. [...] Simply put, scraped content comes with too many problems to be effective. It’s unrelaible, often of poor quality (especially for keyword density purposes), can land a spammer in copyright troubles and may subject the spammer to duplicate content penalties, severely hurting his rankings. It can also lead to embarassing situations and horrible mistakes. [...]

  10. [...] Consider Prevention: There are various ways to prevent a site from scraping a feed. Though truncating a feed may not help in these situations, cloaking can. [...]

  11. [...] If you run your own Web server or use a paid host, you have a great deal of control with your site. If you find that someone is scraping your feed, you can take steps to stop them. [...]

  12. [...] splog turko que por lo que veo recién acaban de crear pero ya le puse stop por medio de un post de plagiarism y warari warara asunto [...]

  13. [...] is some info on fighting splogs, and this is a delightful read (and potentially useful for those with more control over their blogging [...]

  14. [...] 111. WordPress Ajax Commenting revisited. 112. Widgetizing Themes. (Source: Automattic.) 113. Cloaking to Stop Scraping. 114. Server load button for blogs. 115. Giving each WordPress post a thumbnail, and display the [...]

  15. [...] Cloaking to Stop Scraping ( Sursa: Plagiarism [...]

  16. [...] 111. WordPress Ajax Commenting revisited. 112. Widgetizing Themes. (Source: Automattic.) 113. Cloaking to Stop Scraping. 114. Server load button for blogs. 115. Giving each WordPress post a thumbnail, and display the [...]

  17. [...] Cloaking to Stop Scraping (Source: Plagiarism [...]

  18. [...] 23. Cloaking to Stop Scraping (Source: Plagiarism Today) [...]

  19. [...] 111. WordPress Ajax Commenting revisited. 112. Widgetizing Themes. (Source: Automattic.) 113. Cloaking to Stop Scraping. 114. Server load button for blogs. 115. Giving each WordPress post a thumbnail, and display the [...]

  20. [...] 111. WordPress Ajax Commenting revisited. 112. Widgetizing Themes. (Source: Automattic.) 113. Cloaking to Stop Scraping. 114. Server load button for blogs. 115. Giving each WordPress post a thumbnail, and display the [...]

  21. [...] Cloaking to Stop Scraping (Source: Plagiarism [...]

  22. [...] 111. WordPress Ajax Commenting revisited. 112. Widgetizing Themes. (Source: Automattic.) 113. Cloaking to Stop Scraping. 114. Server load button for blogs. 115. Giving each WordPress post a thumbnail, and display the [...]

  23. [...] method was originally thought of by a webmaster called RSnake, and he shared the codes here so that it could be used by other webmasters as a means of protecting themselves as [...]

  24. [...] 111. WordPress Ajax Commenting revisited 112. Widgetizing Themes (Source: Automattic) 113. Cloaking to Stop Scraping 114. Server load button for blogs 115. Giving each WordPress post a thumbnail, and display the [...]

  25. [...] This is a technique that I found on another blog and I have not tried it myself. Basically the idea is to fool splog bots by providing them a different version of the feed content than what is seen by humans and clean bots. You may read about this feed cloaking technique here [...]

  26. [...] 111. WordPress Ajax Commenting revisited. 112. Widgetizing Themes. (Source: Automattic.) 113. Cloaking to Stop Scraping. 114. Server load button for blogs. 115. Giving each WordPress post a thumbnail, and display the [...]

  27. [...] 111. WordPress Ajax Commenting revisited. 112. Widgetizing Themes. (Source: Automattic.) 113. Cloaking to Stop Scraping. 114. Server load button for blogs. 115. Giving each WordPress post a thumbnail, and display the [...]

  28. [...] can also be mitigated against, there isn’t much to stop them.One other possibility is to use cloaking to trick scrapers into grabbing the wrong content, but that possibility also carries its own risks with it.Bottom LineGiven the fact that there is [...]

  29. [...] another will say well no, not if you add this to .htaccess or apache2.conf or we should try this cloaking thing or … argh. A simple cost/benefit analysis on this, in my estimation, is to let it slide in almost [...]

  30. [...] 8. Cloak your RSS feeds. It means provide different RSS feeds to known scrapers. Check your server logs to find the IP address of the person/site who scraps your RSS feed/contents all the time and then serve him different version of the RSS feed. You can find more about this method here. [...]

  31. [...] This is a technique that I found on another blog and I have not tried it myself. Basically the idea is to fool splog bots by providing them a different version of the feed content than what is seen by humans and clean bots. You may read about this feed cloaking technique here [...]

  32. [...] recently ran across an article that discussed one option in the fight against scraping called cloaking. In a nutshell the concept is to insert some PHP into the content that will present alternate [...]

  33. [...] Cloaking to Stop Scraping via Plagiarism Today. [...]