An Even Darker Side of Scraping

Rose DesRochers recently posted a comment to my article on Keyword Splogging asking if it was happening to her. After a brief email exchange some quick investigating, I came to the conclusion that, most likely, it was exactly what her splogger was doing. From all appearances, it looks as if he is taking RSS feeds from all across the Web, using software to find articles with the keywords he needs and posting them to his splogs. 

However, as I was investigating, I found something far more disturbing. Though the content theft alone was bad enough (the articles were attributed but still used in full without permission and in a commercial environment), the scraping was done almost completely blindly, resulting in one of the worst reuses of content I have ever seen. 

In short, in a bid to promote "teen" pornography, the splogger stole DesRochers' parental guide for keepings children safe online as well as content from a Web site dedicated to missing and exploited children

The splogger, whom according to domain registry records list as a French-native named Pankaj Saini, is using content from sites dedicated to protecting children to promote "teen" pornography.

Sadly, bloggers writing about missing children aren't the only ones at risk.

The Danger

Keyword splogging, though significantly "smarter" than straight RSS scraping, is still fairly dumb. Such splogging applications see the desired keyword and grab the article. It makes no judgment about the article itself beyond the fact that it has those words in it.

This means that any article with popular spam keywords is a likely target for such scrapers, even if the article or site is decidedly against whatever the spammer is promoting. 

For example, if you run a blog that works to help recovering gambling addicts it is very likely your works are being used to promote online casinos.  Also, if you run a site that looks at the side effects and potential dangers of prescription medications, you'll likely find your content helping to promote online pharmacies.

It's a strange twist of fate in a world where machines judge content not based upon the nature of the work, but on the individual words it contains.

But Wait… It Gets Worse

Though most people find the lack of attribution offered by most sploggers to be one of the most vile elements of the splogging universe, when dealing with these spammers, it can be something of a relief. 

Some sploggers, like in DesRochers' case, do attribute their sources, effectively tying their name with the very thing they are trying to stop or prevent. This can, feasilbly, damage the reputation of someone who is working hard to deal with a very serious social or personal issue. In an era where employers frequently do Google searches for names, as do love interests and others, this type of connection can have a very serious impact on one's life.

This adds a new level to splogging, raising it from just pure content theft to a form of identity theft. It makes it appear, at least to the casual observer, that the victim supports the advertisers and products promoted by the splogger, even though the opposite is actually true.

It's a frustrating situation, made even more frustrating by the challenges that come up when trying to stop it.

Cessation

The good news is that, like any other kind of splogging, this twisted keyword splogging is a violation of copyright and can be acted on in the same ways as any other copyright infringement. However, it's important to be careful, especially with loose Creative Commons Licenses and other liberal copyright licenses as they do not make a distinction between good uses and bad uses. In short, even sploggers that twist words and damage their victim's reputation might be completely legal if the work is licensed under very loose terms.

As a part of ethical plagiarism fighting, it is important to stick to your license, even if you don't like the way the work is used. This makes it critical to at least consider these situations when picking out a license for your content and making sure that it both prevents uses you aren't comfortable with as well as allowing uses that you are.

Finally, it is very important that content creators be aware of the potential keywords found in their material and, if their keywords are likely targets for spammers, that they begin to more actively search for their own content so they can more effectively protect both their name and their work.

There is little doubt that scrapers do not care what they take and your content, whether it is appropriate for their goals or not, could wind up on a splogger's site just because it had the words they were looking for.

Conclusions 

In the end, there are very few extra precautions or steps one can take to deal with this kind of splogger. All it does is expose the sheer recklessness behind sploggers and their extreme lack of ethics.

Sploggers may not know that their scrapers swipe swipe content intended for righteous reason and place it in a very sinister light, but they have to at least be aware of the possibility. The fact that they continue to scrape and splog only shows exactly how far these "black hats" will go to make money.

It is sad, but not entirely unexpected from a group of people who make their living by stealing other people's hard work and cluttering the Internet with useless junk. 

[tags]Plagiarism, Content Theft, Copyright Infringement, Splogging, Scraping, RSS, Splogs[/tags] 

6 comments
Sort: Newest | Oldest
Alex
Alex

I find this sociopathic behavior endlessly depressing. In my fantasy world, I would require ID verification before giving anyone an Internet account. If this account is used for abusive purposes, the users is penalized or banned. Simple concept. If you cannot play by the rules, you do not get to play at all. If the ISP does not hold the user responsible, then the ISP is penalized. Accountability, all the way.

By the way, this has nothing to do with "freedom of speech." Only *abusive* behavior is penalized. Certainly, we would have to define what is acceptable and unacceptable, but this is not an impossible task. Those rules would also have to be periodically revised, but this is how society works, anyway. Our laws are not perfect, but they serve a purpose. What is the alternative? Complete anarchy?

Of course, I realize this is not going to happen, but I can still dream about it. We live in a world where drunk drivers are allowed back on the road, and there is little or no accountability for polluting the Internet.

Alex
Alex

I find this sociopathic behavior endlessly depressing. In my fantasy world, I would require ID verification before giving anyone an Internet account. If this account is used for abusive purposes, the users is penalized or banned. Simple concept. If you cannot play by the rules, you do not get to play at all. If the ISP does not hold the user responsible, then the ISP is penalized. Accountability, all the way.

By the way, this has nothing to do with "freedom of speech." Only *abusive* behavior is penalized. Certainly, we would have to define what is acceptable and unacceptable, but this is not an impossible task. Those rules would also have to be periodically revised, but this is how society works, anyway. Our laws are not perfect, but they serve a purpose. What is the alternative? Complete anarchy?

Of course, I realize this is not going to happen, but I can still dream about it. We live in a world where drunk drivers are allowed back on the road, and there is little or no accountability for polluting the Internet.

Matthijs
Matthijs

The internet is dead. It gets worse and worse. I'm not sure I want to know how it is in 5 or 10 years from now. Will 90% of the content of the web be fake, scraped, commercial blackhat stuff?

Don't you find it depressing? How do we have to solve this? Can we?

Matthijs
Matthijs

The internet is dead. It gets worse and worse. I'm not sure I want to know how it is in 5 or 10 years from now. Will 90% of the content of the web be fake, scraped, commercial blackhat stuff?

Don't you find it depressing? How do we have to solve this? Can we?

Rose DesRochers
Rose DesRochers

Jonathon, a very informative article. Thank you for your assistance today. It was much appreciated. The content has been removed from his websites.

Rose DesRochers
Rose DesRochers

Jonathon, a very informative article. Thank you for your assistance today. It was much appreciated. The content has been removed from his websites.

Trackbacks

  1. [...] seen sploggers scrape anti-child pornography sites in to promote “teen porn”, scrape hundreds, even thousands, of feeds to capture just a [...]

  2. [...] simply not blogging about these topics is not enough. As Rose Desroches found out over a year ago, spam bloggers determine what content is scraped not by the actual content, but by targeted [...]