An Even Darker Side of Scraping

Jonathan BaileyJuly 12, 2006

4 minutes read

Rose DesRochers recently posted a comment to my article on Keyword Splogging asking if it was happening to her. After a brief email exchange some quick investigating, I came to the conclusion that, most likely, it was exactly what her splogger was doing. From all appearances, it looks as if he is taking RSS feeds from all across the Web, using software to find articles with the keywords he needs and posting them to his splogs.

However, as I was investigating, I found something far more disturbing. Though the content theft alone was bad enough (the articles were attributed but still used in full without permission and in a commercial environment), the scraping was done almost completely blindly, resulting in one of the worst reuses of content I have ever seen.

In short, in a bid to promote "teen" pornography, the splogger stole DesRochers' parental guide for keepings children safe online as well as content from a Web site dedicated to missing and exploited children.

The splogger, whom according to domain registry records list as a French-native named Pankaj Saini, is using content from sites dedicated to protecting children to promote "teen" pornography.

Sadly, bloggers writing about missing children aren't the only ones at risk.

The Danger

Keyword splogging, though significantly "smarter" than straight RSS scraping, is still fairly dumb. Such splogging applications see the desired keyword and grab the article. It makes no judgment about the article itself beyond the fact that it has those words in it.

This means that any article with popular spam keywords is a likely target for such scrapers, even if the article or site is decidedly against whatever the spammer is promoting.

For example, if you run a blog that works to help recovering gambling addicts it is very likely your works are being used to promote online casinos. Also, if you run a site that looks at the side effects and potential dangers of prescription medications, you'll likely find your content helping to promote online pharmacies.

It's a strange twist of fate in a world where machines judge content not based upon the nature of the work, but on the individual words it contains.

But Wait… It Gets Worse

Though most people find the lack of attribution offered by most sploggers to be one of the most vile elements of the splogging universe, when dealing with these spammers, it can be something of a relief.

Some sploggers, like in DesRochers' case, do attribute their sources, effectively tying their name with the very thing they are trying to stop or prevent. This can, feasilbly, damage the reputation of someone who is working hard to deal with a very serious social or personal issue. In an era where employers frequently do Google searches for names, as do love interests and others, this type of connection can have a very serious impact on one's life.

This adds a new level to splogging, raising it from just pure content theft to a form of identity theft. It makes it appear, at least to the casual observer, that the victim supports the advertisers and products promoted by the splogger, even though the opposite is actually true.

It's a frustrating situation, made even more frustrating by the challenges that come up when trying to stop it.

Cessation

The good news is that, like any other kind of splogging, this twisted keyword splogging is a violation of copyright and can be acted on in the same ways as any other copyright infringement. However, it's important to be careful, especially with loose Creative Commons Licenses and other liberal copyright licenses as they do not make a distinction between good uses and bad uses. In short, even sploggers that twist words and damage their victim's reputation might be completely legal if the work is licensed under very loose terms.

As a part of ethical plagiarism fighting, it is important to stick to your license, even if you don't like the way the work is used. This makes it critical to at least consider these situations when picking out a license for your content and making sure that it both prevents uses you aren't comfortable with as well as allowing uses that you are.

Finally, it is very important that content creators be aware of the potential keywords found in their material and, if their keywords are likely targets for spammers, that they begin to more actively search for their own content so they can more effectively protect both their name and their work.

There is little doubt that scrapers do not care what they take and your content, whether it is appropriate for their goals or not, could wind up on a splogger's site just because it had the words they were looking for.

Conclusions

In the end, there are very few extra precautions or steps one can take to deal with this kind of splogger. All it does is expose the sheer recklessness behind sploggers and their extreme lack of ethics.

Sploggers may not know that their scrapers swipe swipe content intended for righteous reason and place it in a very sinister light, but they have to at least be aware of the possibility. The fact that they continue to scrape and splog only shows exactly how far these "black hats" will go to make money.

It is sad, but not entirely unexpected from a group of people who make their living by stealing other people's hard work and cluttering the Internet with useless junk.

[tags]Plagiarism, Content Theft, Copyright Infringement, Splogging, Scraping, RSS, Splogs[/tags]

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free

Jonathan BaileyJuly 12, 2006

4 minutes read

Want to Reuse or Republish this Content?

Follow us