CrowdSourcing Spam Blogging?

Jonathan BaileyMarch 25, 2010

5 minutes read

For spam bloggers, or sploggers as they are often known, copyright is one of the most daunting challenges. It only takes one or two copyright complaints to bring down a spam blog network by alerting the host, destroying a significant amount of work. Likewise, a few complaints to advertisers can strip a splogger of a large percentage of their income.

Because of this, splogs have been working on finding ways to feign legitimacy. This helps them both stay online longer as hosts are more reluctant to take them down, helps them better establish a rapport with the search engines, their end goal in most cases, and appeal more to human visitors.

They’ve used many tactics to meet this goal including truncating content use to comply with fair use, spinning content so that it is unrecognizable and even skipping on borrowing content at all and simply using automatic content generation.

However, several readers have drawn my attention to a new kind of spam site, one that, according to their site, gets its readers to submit RSS feeds for inclusion and instead tries to hide behind a veil of user-generated content. This idea of crowd-sourcing spam is a relatively new one to me, one that actually closely mirrors YouTube’s “wild west” early days, but is almost certainly going to upset many bloggers who have had their content used without permission.

The Example

Note: All links to the site have been nofollowed. Please visit those links carefully and note that you do so at your own risk. The links are included purely for demonstration purposes.

TheBlogHub is, by all appearances, a very large and prolific spam blog network. It republishes the full RSS feeds from roughly 50,000 sites without truncation and while hotlinking the original source images.

This includes many of the Web’s most popular blogs including TechCrunch (which appears to be out of date), Mashable and Engadget (Also out of date).

However, according to the site, all the RSS feeds are submitted by users of the service. The exact nature of this service is unclear beyond the site’s mission statement of “to provide quick and easy access to relevant blogs and articles for our guests and members, whilst promoting the respective blogs and their authors.”

But the site does very little to actually promote authors. Not only is the full content used, but the site’s robots.txt file encourages search engines to read the content, thus making it a direct competitor with the original articles and there is no link back to the individual posts, just a small link back to the home page at the top of a site’s content.

The site also accepts comments on its service, which has the potential to further fragment the audience and conversation for the blogs involved.

To make matters even worse, though the site does offer a means to request removal of content, you are required to give some form of verification that it is your content. However, to add a feed into the service there appears to be no such need. The site does, however, offer a means to file a DMCA-like notice buried in their terms of service but the email address bounces mail as undeliverable.

Hosted on Web24, the site appears to be based out of Australia and has ties to an Australian company, other sites of which are advertised heavily on the site.

In short, despite the fact that this site proudly proclaims not to be a spam blog network, it at the very least bears all the signs of being as such. If its goal is to truly be a legitimate service, is has many steps that it should take to be more cooperative with the original authors.

Note: An attempt to email the creators of the site via both the listed email address and the address listed to receive DMCA complaints were returned as undeliverable.

Why This is a Problem

If these sites are truly crowdsourcing the locating and addition of RSS feeds, which is up for debate, it can create challenges for content creators whose works are being reused without permission.

First, in some cases, the sites may qualify for safe harbor. If the content is actually provided by the direction of users and they can show they did not profit directly from the infringement, they may be able to claim safe harbor. However, this is heavily muddled by the Grokster ruling which holds companies can be held liable for “inducing” copyright infringement. However, this only applies to the U.S. and the issue becomes further muddled when other nations become involved.

Second, hosts will be much less likely to take down such sites if they seem legitimate. Instead, they will more likely pass on any infringement notices to the owners of the site, allowing them the chance to remove it and continue on with the other content.

Finally, content creators will be more inclined to treat these sites as legitimate and contact the owners directly, if possible, to resolve these matters. Even if the site is intentionally or tacitly encouraging infringement and benefiting from it, copyright holders will treat them as if they were other legitimate hosts.

The problem with all of this, however, is that it seems unlikely to me that users would, willingly, crowdsource a spam blog network. Contributing RSS feeds to a service for “centralization” seems like an unlikely service to attract thousands of visitors. Instead, it seems to me much more likely that these sites merely attempt to give the appearance of legitimacy by feigning as if the content is submitted from 3rd parties.

Bottom Line

Still, I have no way of knowing with any certainty what is going on in this particular case. But whether they are actually receiving the feeds from users who are agreeing to their terms of service or simply pretending, the result is the same, scraped content from many thousands of sites, the majority of which almost certainly never gave permission.

A spam blog is a spam blog. Whether it is created intentionally, through recklessness or even simple mistake, the outcome is the same.

As such, the spam blogs need to be dealt with accordingly. Though contacting the owner might be best in cases where it seems to be a simple mistake, such as with an RSS reader that was accidentally exposed to the broader Web, in other cases it is most likely best to go with the hosts or advertisers if possible.

Though I typically encourage people to try and sort disagreements over copyright face-to-face. However, with spammers it is usually a waste of time. As with the case in this site, two letters seeking comment bounced back, including one to the email address supposedly set up to receive notices of copyright infringement.

If your feed is republished on the above site, for example, and you want it removed. You would likely be better off reaching out to their host, especially since all of their contact addresses no longer work.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free