Automatic Plagiarism

Jonathan BaileyAugust 1, 2005

3 minutes read

The software to read RSS feeds is nothing new. Most mail clients offer that feature and several smaller products specifically targeted at parsing and reading RSS and Atom feeds. Also, software to automatically post to blogs is nothing new. Several packages offer the ability to post to blogs without the aid of a browser, including the power to post across multiple blogs and in different formats.

It was only a matter of time before someone sinister combined the two to create a form of automated plagiairsm.

The idea is simple. You create a blog centered on a topic and input a bunch of applicable feeds into your software. The software scans for the keywords you chose, steals (or “scrapes”) the content that matches the keywords and posts it to your new temple of copyright infringement. Throw in a few advertisements (like Google Adsense) and you have an automated money machine operating twenty four hours a day, seven days a week.

However, if it’s a license to print money, it’s printed on the back of their victims’ copyright certificates.

An example of how this could work is Buzzburner by Feedburner. Buzzburner is fundamentally designed to help cross-promote different blogs by displaying information using established feeds. It’s a very cool utility that also showcases the potential for trouble. After all, one can theoretically take this code and center their entire blog around it. No original content required.

However, such a use for Buzzburner would be very limited. Not only is the content javascript-based, so it isn’t indexed by search engines (a major reason blog thieves steal content), but it doesn’t select articles based on content. It’s indiscriminate in nature. In the end, Buzzburner is a legitimate tool that gives us a glimpse at the potential for harm.

Rest assured though that the bad guys have all of the tools they need to automatically scrape and repost. Several bloggers have already fallen victim to these farming sites and the problem is only going to grow as the theft becomes easier and more lucrative.

However, even the best scraping software doesn’t make for good plagiarism. Most bloggers, including myself, post links or make references to past articles. Automatic software, which just copies and pastes the post, doesn’t remove those links. This clearly marks the stolen articles as being unoriginal and taints the blog in the eyes of even the most casual observer.

This doesn’t matter to the scrapers though, they’re just wanting to get good search engine fodder so they can drive traffic and get the clicks. They don’t care if anyone actually reads their plagiarized content so long as they click the ads. Quantity, not quality, is what counts.

For the time being at least, the biggest threat still comes from the human plagiarists who copy and paste content, manipulate it to appear as if it where their own and repost it. They’re the ones who cause the most confusion, the ones who benefit the most from their plagiarsim, gaining both reputation and revenue, and are the most difficult to stop.

The worry though is that the scrapers will upgrade their software to address many of these shortcomings and will be able to create whole blogs that can be mistaken, even by humans, as being legit. This will create many new challenges for bloggers and may result in a shift in the way RSS feeds are used.

Because, no matter how true the maxim of “there is no such thing as bad technology” is, evil minds can warp advances of any kind to their advantage. More than anything, this might expose a serious vulnerability in the concept of RSS and may force us all to look at the way we use it and seek out new ways to improve the technology so that it can’t be perverted again.

At least not so easily that is…

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free

Jonathan BaileyAugust 1, 2005

3 minutes read

Want to Reuse or Republish this Content?

Follow us