A Scrape of a Scrape

By Jonathan Bailey • Aug 7th, 2007 • Category: Articles, Personal Experiences, Prevention

I often get asked by reporters and bloggers alike exactly how bad scraping is on the Web. I discuss my past experiments on the topic and how, depending on your keywords, suspicious traffic starts showing up with the first post.

However, as I was searching for information on IE7 security flaws for another site I’m working on, I ran across something that was truly mind-blowing.

On Google Blogsearch, this result (nofollowed) was one of the first to pop up. One look at it and you can clearly tell that it is a scrape of another post. However, kindly enough, the scraper left information about their source. I followed through on that and was taken to this entry (nofollowed), yet another scraped page.

It was only after following the results link there that I was taken to the original post on the IEBlog.

It is stunning, though not surprising, to think that scraping is so common that scrapers are picking up each other’s blogs. What makes this situation somewhat unique is that we were able to follow the trail since both scraper sites link to their original source. However, it shows the potential for a post to get scraped again and again as its copies get picked up by other spambots.

In short, every feed your work appears on can, and most likely will, be scraped, even if the appearance is unwanted. It may even be possible to piece together much longer chains of scraping, where you end up with a fifth or sixth generation scrape.

In this case, the first feed was most likely a scrape of a search engine feed such as Google Blogsearch or Technorati. The second one is a news site that, it appears, is reposting and redistributing the entire content of feeds in certain places, though stripping formatting in the process.

This gives us yet another reason to get a handle on our RSS feeds and make sure that they don’t fall in the wrong hands to begin with. Though these sites attributed their use, most are not so generous and even attributed scraping can cause problems.

All in all, it is best to be mindful of this problem and respond accordingly.

Related Posts with Thumbnails
Share and Enjoy:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Reddit
  • StumbleUpon
  • Slashdot
  • Tumblr
  • Fark

Short URL to this Post: http://copybyte.com/z/f9

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

View Comments to “A Scrape of a Scrape”

  1. That is pretty humorous when it happens. I had that problem a few times in the past with some articles I wrote for places like Article City. A splogger stole the article, then another one got it, and another one, etc. Got about 5 levels deep.

  2. That is pretty humorous when it happens. I had that problem a few times in the past with some articles I wrote for places like Article City. A splogger stole the article, then another one got it, and another one, etc. Got about 5 levels deep.

  3. JB says:

    Jeremy,

    Maybe we should have a competition and see how deep we can track one of these. The one I posted was three levels deep, you’ve had five. I wonder how low we can go…

  4. JB says:

    Jeremy,

    Maybe we should have a competition and see how deep we can track one of these. The one I posted was three levels deep, you’ve had five. I wonder how low we can go…

  5. Mike Goad says:

    I just followed the links back that I got as a result of Darren Rowse’s 31 day project – and it led straight back to a scrape of his post. I deleted that trackback from the comments. While I know it’s insignificant compared the total amount of “link love” this scum is getting from it, they’re not going to get it from me.

    (I just hate this kind of unethical stuff on the internet)

    Oh, and thanks for stopping by my blog and commenting. I’ve added your blog to my feeds. I have had an interest in copyright for quite a while for much the same reasons as you, except my issues were with copyright and online genealogy.

  6. Mike Goad says:

    I just followed the links back that I got as a result of Darren Rowse’s 31 day project – and it led straight back to a scrape of his post. I deleted that trackback from the comments. While I know it’s insignificant compared the total amount of “link love” this scum is getting from it, they’re not going to get it from me.

    (I just hate this kind of unethical stuff on the internet)

    Oh, and thanks for stopping by my blog and commenting. I’ve added your blog to my feeds. I have had an interest in copyright for quite a while for much the same reasons as you, except my issues were with copyright and online genealogy.

  7. JB says:

    Mike,

    Glad to have you stop by. If more people took your interest in denying “link love” then spammers would have to find a new means of propagating their networks. Congrats on the kill.

    Let me know if I can help in any way, I’m always here if I can assist!

  8. JB says:

    Mike,

    Glad to have you stop by. If more people took your interest in denying “link love” then spammers would have to find a new means of propagating their networks. Congrats on the kill.

    Let me know if I can help in any way, I’m always here if I can assist!

Leave a Reply

blog comments powered by Disqus