The Spread of Spam

In an interview yesterday with Web Pro News Matt Mullenweg, WordPress founding developer and founder of Automattic, the company behind WordPress.com, said that his company had removed over 800.000 spam blogs from the service.

Given that the number of blogs powered by the service is a little over 2 and a half million, that means that approximately one out of every three blogs ever posted were spam.

Though the estimations have been the source of much excitement, they should come as no surprise to those who have followed spam blogs for some time. Furthermore, they pale in comparison to the numbers BlogSpot has historically racked up, which has included numbers well over 50% in some cases.

The numbers, however, do paint a grim picture regarding spam blogs and show that splogs are no longer confined to the “soft” targets we know well. They’re spreading, or at least attempting to spread, to new services.

This includes, possibly, services you use every day.

A Quick Experiment

To see how 800,000 removed spam blogs translated into the user experience at WordPress.com, I decided to test and see how many spam blogs likely remained behind.

To do that, I logged into the service and used the “Next Blog” feature in the navigation bar to see how many blogs I had to go through to view 30 legitimate blogs blogs. I then repeated the experiment with Google’s Blogspot service to compare how spam-free the two services were.

To better organize the results, I divided every blog I saw into three categories, confirmed legitimate, definite spam and questionable.

The results were pretty staggering.

Blogspot

Legitimate Blogs 30
Obvious Spam: 5
Questionable: 4

WordPress

Legitimate Blogs 30
Obvious Spam: 0
Questionable: 1

Results

As you can see, with BlogSpot, it took 39 blog viewings to see 30 legitimate blogs. Of the ones that were not legitimate with a high degree of certainty, five were almost certainly spam and four were questionable.

With WordPress.com, it only took 31 viewings to see 30 legitimate blogs and the one that was not confirmed to be legitimate was merely questionable, not a definite spam blog.

The results are likely somewhat skewed in WordPress’ favor though. BlogSpot only seemed to show blogs that had been updated that day and that would likely favor spam blogs. Also, both sites, despite having millions of potential blogs to choose from, repeated blogs in their rotation at least twice. Those were discarded, but it was one of the reasons I stopped the experiment at thirty legitimate blogs.

However, what is clear is that the 800,000 spam blogs removed by WordPress are an indication at how effective the site has been at fighting spam and that, despite the numbers, WordPress.com is still relatively spam free.

These numbers also show that BlogSpot has made at least some strides against spam as, in this testing, less than 25% of the blogs were potential spam. That is far less than previous estimates.

Still, this should not be taken as a sign that the tide is turning against spam blogs, evidence on the ground seems to indicate something much different and it gives us as Webmasters many reasons to worry.

The Spread of Spam

Though splogs have only rarely penetrated the regular search results at Google or Yahoo!, they’ve been able to effectively poison the Google Blog Search results and continue make regular appearances on other blog search engines.

But while the reports of scraping and spam blogging still come in regularly to me, what is changed has been their location. Where a year ago the spam blog epidemic was restricted to BlogSpot and a handful of “questionable” domain hosts, more services seem to be touched by the spam blog issue.

Just some of the services I have seen unwittingly hosting spam blogs in the past few months include the following:

  • Multiply: Multiply is not the type of place one typically thinks of for spam blogs but I’ve had at least three reports of victims being scraped and/or plagiarized by someone on that service in the past two weeks. They might have been the subject of a spam blog attack recently. However, this is difficult to test since Multiply does not offer an easy way to browse the blogs without registration.
  • GreatestJournal: The blog site that uses the same backend as LiveJournal has seen several spam blogs take to the service. Though the ones I’ve seen were cleaned off soon after reporting, the combination of a familiar backend with a lesser-known name makes it an appealing target for spammers.
  • Weebly: The new site and blog creation service has been targeted by spammers heavily in recent weeks, many of them using the service as referral links for comment and trackback spam.
  • Domain Hosts: While the prevalence of spam on domain hosts is nothing new, it has traditionally been confined to international hosts and/or hosts that have a reputation for looking the other way. Lately though, I’ve been seeing spam on hosts such as GoDaddy, ThePlanet and other, well-known and legitimate services.
  • International Blog Hosts: Finally, while international blog hosts have have always been popular targets, they are becoming even more common, especially Czech, Italian, Russian and Portuguese sites.

These are just some of the various services that were not previously spam havens I’ve encountered spam content on the past few months. This is not designed to be a criticism of these services, it is not their fault they drew the attention of spammers, but to illustrate the types of sites that are becoming more popular with spammers.

What This Means

What this tells me is that the traditional work horses of spam, BlogSpot and “safe” domain hosts, are no longer working for spammers. Whether this is because of efforts by those hosts to purge spam, as is likely the case with BlogSpot, or due to better search engine detection, remains to be seen.

But what is clear is that spammers, in a bid to stay ahead of the curve, are starting to move on to or experiment with, other services. Those include services that are lesser known but do not have as strong of a reputation for spam nor, most likely, the capability of combating a full-on spam attack.

Though these services, and others like them, may not be as well-known as other players in the game, they have a high amount of search engine trust, very “open” structures and, in many cases, limited financial resources. That makes them excellent targets for spammers and means that, over the course of the coming months and years, we’re likely to see most scraping and spam blogging shifting even more to those kinds of services.

Conclusions

The bottom line is that spam is spreading, rapidly. New services are being targeted and that means, from the viewpoint of a legitimate Webmaster, it is going to mean a more diverse fight, both against comment spam and against scraping.

Where previously targeting our message at one or two hosts was an effective way to push back against the majority of spammers, the diversification of spam means that both the message and cooperation will have to be much more broad in order to create real change.

Even worse, this means that the spam fight is rapidly spreading to hosts that often lack the capability to push back effectively. Where Google, Yahoo! and Microsoft have the resources to tackle spam, even if they are not always fully utilized, it remains to be seen if these new sites will be prepared.

But what is most clear is that spammers are making more effort than ever to blend in with with legitimate blogs, in a bid to fool both humans and search engines. This means that many of the “safe havens” we have enjoyed will not be free from spam much longer.

When I originally predicted this spam exodus back in 2006, I did not foresee that the shift would be a slow one. However, much of the shift is clearly well underway and in the coming months we may start to see even more of the impact of the broader push by spammers.

As Webmasters, we need to be aware of this shift as our content, our comment boxes and our search engine rankings are often innocent bystanders in the fight between junk content and the search engines they target.

If we know which way the fire is coming from, we might be a little better prepared.

Photo Credit: Image of Spam Blog from Blogspot. “Live” image is from Weebly.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free