The Spread of Spam

In an interview yesterday with Web Pro News Matt Mullenweg, WordPress founding developer and founder of Automattic, the company behind WordPress.com, said that his company had removed over 800.000 spam blogs from the service.

Given that the number of blogs powered by the service is a little over 2 and a half million, that means that approximately one out of every three blogs ever posted were spam.

Though the estimations have been the source of much excitement, they should come as no surprise to those who have followed spam blogs for some time. Furthermore, they pale in comparison to the numbers BlogSpot has historically racked up, which has included numbers well over 50% in some cases.

The numbers, however, do paint a grim picture regarding spam blogs and show that splogs are no longer confined to the “soft” targets we know well. They’re spreading, or at least attempting to spread, to new services.

This includes, possibly, services you use every day.

A Quick Experiment

To see how 800,000 removed spam blogs translated into the user experience at WordPress.com, I decided to test and see how many spam blogs likely remained behind.

To do that, I logged into the service and used the “Next Blog” feature in the navigation bar to see how many blogs I had to go through to view 30 legitimate blogs blogs. I then repeated the experiment with Google’s Blogspot service to compare how spam-free the two services were.

To better organize the results, I divided every blog I saw into three categories, confirmed legitimate, definite spam and questionable.

The results were pretty staggering.

Blogspot

Legitimate Blogs 30
Obvious Spam: 5
Questionable: 4

WordPress

Legitimate Blogs 30
Obvious Spam: 0
Questionable: 1

Results

As you can see, with BlogSpot, it took 39 blog viewings to see 30 legitimate blogs. Of the ones that were not legitimate with a high degree of certainty, five were almost certainly spam and four were questionable.

With WordPress.com, it only took 31 viewings to see 30 legitimate blogs and the one that was not confirmed to be legitimate was merely questionable, not a definite spam blog.

The results are likely somewhat skewed in WordPress’ favor though. BlogSpot only seemed to show blogs that had been updated that day and that would likely favor spam blogs. Also, both sites, despite having millions of potential blogs to choose from, repeated blogs in their rotation at least twice. Those were discarded, but it was one of the reasons I stopped the experiment at thirty legitimate blogs.

However, what is clear is that the 800,000 spam blogs removed by WordPress are an indication at how effective the site has been at fighting spam and that, despite the numbers, WordPress.com is still relatively spam free.

These numbers also show that BlogSpot has made at least some strides against spam as, in this testing, less than 25% of the blogs were potential spam. That is far less than previous estimates.

Still, this should not be taken as a sign that the tide is turning against spam blogs, evidence on the ground seems to indicate something much different and it gives us as Webmasters many reasons to worry.

The Spread of Spam

Though splogs have only rarely penetrated the regular search results at Google or Yahoo!, they’ve been able to effectively poison the Google Blog Search results and continue make regular appearances on other blog search engines.

But while the reports of scraping and spam blogging still come in regularly to me, what is changed has been their location. Where a year ago the spam blog epidemic was restricted to BlogSpot and a handful of “questionable” domain hosts, more services seem to be touched by the spam blog issue.

Just some of the services I have seen unwittingly hosting spam blogs in the past few months include the following:

  • Multiply: Multiply is not the type of place one typically thinks of for spam blogs but I’ve had at least three reports of victims being scraped and/or plagiarized by someone on that service in the past two weeks. They might have been the subject of a spam blog attack recently. However, this is difficult to test since Multiply does not offer an easy way to browse the blogs without registration.
  • GreatestJournal: The blog site that uses the same backend as LiveJournal has seen several spam blogs take to the service. Though the ones I’ve seen were cleaned off soon after reporting, the combination of a familiar backend with a lesser-known name makes it an appealing target for spammers.
  • Weebly: The new site and blog creation service has been targeted by spammers heavily in recent weeks, many of them using the service as referral links for comment and trackback spam.
  • Domain Hosts: While the prevalence of spam on domain hosts is nothing new, it has traditionally been confined to international hosts and/or hosts that have a reputation for looking the other way. Lately though, I’ve been seeing spam on hosts such as GoDaddy, ThePlanet and other, well-known and legitimate services.
  • International Blog Hosts: Finally, while international blog hosts have have always been popular targets, they are becoming even more common, especially Czech, Italian, Russian and Portuguese sites.

These are just some of the various services that were not previously spam havens I’ve encountered spam content on the past few months. This is not designed to be a criticism of these services, it is not their fault they drew the attention of spammers, but to illustrate the types of sites that are becoming more popular with spammers.

What This Means

What this tells me is that the traditional work horses of spam, BlogSpot and “safe” domain hosts, are no longer working for spammers. Whether this is because of efforts by those hosts to purge spam, as is likely the case with BlogSpot, or due to better search engine detection, remains to be seen.

But what is clear is that spammers, in a bid to stay ahead of the curve, are starting to move on to or experiment with, other services. Those include services that are lesser known but do not have as strong of a reputation for spam nor, most likely, the capability of combating a full-on spam attack.

Though these services, and others like them, may not be as well-known as other players in the game, they have a high amount of search engine trust, very “open” structures and, in many cases, limited financial resources. That makes them excellent targets for spammers and means that, over the course of the coming months and years, we’re likely to see most scraping and spam blogging shifting even more to those kinds of services.

Conclusions

The bottom line is that spam is spreading, rapidly. New services are being targeted and that means, from the viewpoint of a legitimate Webmaster, it is going to mean a more diverse fight, both against comment spam and against scraping.

Where previously targeting our message at one or two hosts was an effective way to push back against the majority of spammers, the diversification of spam means that both the message and cooperation will have to be much more broad in order to create real change.

Even worse, this means that the spam fight is rapidly spreading to hosts that often lack the capability to push back effectively. Where Google, Yahoo! and Microsoft have the resources to tackle spam, even if they are not always fully utilized, it remains to be seen if these new sites will be prepared.

But what is most clear is that spammers are making more effort than ever to blend in with with legitimate blogs, in a bid to fool both humans and search engines. This means that many of the “safe havens” we have enjoyed will not be free from spam much longer.

When I originally predicted this spam exodus back in 2006, I did not foresee that the shift would be a slow one. However, much of the shift is clearly well underway and in the coming months we may start to see even more of the impact of the broader push by spammers.

As Webmasters, we need to be aware of this shift as our content, our comment boxes and our search engine rankings are often innocent bystanders in the fight between junk content and the search engines they target.

If we know which way the fire is coming from, we might be a little better prepared.

Photo Credit: Image of Spam Blog from Blogspot. “Live” image is from Weebly.

22 comments
Sort: Newest | Oldest
Jonathan Bailey
Jonathan Bailey

@Dr. Mike Wendell -
I've read and replied to your original post so I don't have much to say. However, those were the honest results of my test. I've never said that there is no spam on WordPress.com, just that it is far lower than on other sites, such as Blogspot.

I doubt that there is a way to run a free blogging service without having a spam problem...

Dr. Mike Wendell
Dr. Mike Wendell

Only 1 questionable on wordpress.com? That seems low to me. I found just under 100 on Sunday in about an hour using Goolge's blogsearch and a few common terms typical of splogs. (ie Make money fast, the domains that cj.com use, etc.)

I've reported them but as of right now, they're all still online.

Dr. Mike Wendell
Dr. Mike Wendell

Only 1 questionable on wordpress.com? That seems low to me. I found just under 100 on Sunday in about an hour using Goolge's blogsearch and a few common terms typical of splogs. (ie Make money fast, the domains that cj.com use, etc.)

I've reported them but as of right now, they're all still online.

Jonathan Bailey
Jonathan Bailey

@Dr. Mike Wendell -
I've read and replied to your original post so I don't have much to say. However, those were the honest results of my test. I've never said that there is no spam on WordPress.com, just that it is far lower than on other sites, such as Blogspot.

I doubt that there is a way to run a free blogging service without having a spam problem...

Jonathan Bailey
Jonathan Bailey

Voyagerfan: To pass on some immortal advice from my father, just take the compliment and run, lets not argue numbers! I have no idea what is considered popular anymore. I'm certain many readers of this site have blogs that are far more popular than it.

There are a lot of things Google could to to make this problem easier, but at this point it's going to require a major shift in how Blogspot operates to have any major impact. I don't see that happening any time soon.

But yes, a real domain name, a real WP/MovableType install and a good professional layout, rather than one of Blogger's defaults, will go a long way.

Google Groups is insane. I've hit a point that I just ignore anything from groups.google.com. I haven't seen anything legitimate there in months.

Out of curiosity, which comment tracker were you using? I use co.mments.com myself...

Jonathan Bailey
Jonathan Bailey

Voyagerfan: To pass on some immortal advice from my father, just take the compliment and run, lets not argue numbers! I have no idea what is considered popular anymore. I'm certain many readers of this site have blogs that are far more popular than it.

There are a lot of things Google could to to make this problem easier, but at this point it's going to require a major shift in how Blogspot operates to have any major impact. I don't see that happening any time soon.

But yes, a real domain name, a real WP/MovableType install and a good professional layout, rather than one of Blogger's defaults, will go a long way.

Google Groups is insane. I've hit a point that I just ignore anything from groups.google.com. I haven't seen anything legitimate there in months.

Out of curiosity, which comment tracker were you using? I use co.mments.com myself...

Voyagerfan5761
Voyagerfan5761

Spreading butter around is usually a good thing because it gives a more uniform taste to your toast. Spreading spam around just makes the world a more annoying place to Internet. I am happy to be considered one of those "few blogs that are hosted on Blogspot" you consider legitimate. (Does 1,400 visits a month count as popular? I'm not so sure... But I appreciate the compliment. :-) )

If Google were to spend more time actively looking for splogs, or make it easier to flag a blog as spam (the navbar, with its button, is altogether too easy to hide), or even tighten the ToS so splogs are less legitimate under the rules of the hosting site... The world would probably be a better place for it.

All the more reason to get myself a real domain name, I guess. If I do that, I can start putting other services on it as well; custom domains seem to be the latest trend in free amenities.

Google Groups does seem to have a large amount of spam going on, too. It's gotten to the point of being ridiculous.

PS:
Yes, I know it's been days since the last comment, but the comment tracker I use has apparently been slacking on its checks lately... Grr...

Voyagerfan5761
Voyagerfan5761

Spreading butter around is usually a good thing because it gives a more uniform taste to your toast. Spreading spam around just makes the world a more annoying place to Internet. I am happy to be considered one of those "few blogs that are hosted on Blogspot" you consider legitimate. (Does 1,400 visits a month count as popular? I'm not so sure... But I appreciate the compliment. :-) )

If Google were to spend more time actively looking for splogs, or make it easier to flag a blog as spam (the navbar, with its button, is altogether too easy to hide), or even tighten the ToS so splogs are less legitimate under the rules of the hosting site... The world would probably be a better place for it.

All the more reason to get myself a real domain name, I guess. If I do that, I can start putting other services on it as well; custom domains seem to be the latest trend in free amenities.

Google Groups does seem to have a large amount of spam going on, too. It's gotten to the point of being ridiculous.

PS:
Yes, I know it's been days since the last comment, but the comment tracker I use has apparently been slacking on its checks lately... Grr...

Jonathan Bailey
Jonathan Bailey

Forrest: Reading the comments from the black hat seo types, there does appear to be at least some benefit from having an edu domain.

You are right though that the vast majority are coming from other domains. I'm seeing a lot of items from Weebly, Google Groups and your usual Blogspot crowd. It still seems that the "soft" targets are the ones being targeted the most because it is still a battle of quantity over quality.

I seriously doubt that the number ever got up to 93% but you are right that the perception was/is there. Right now, other than Voyagerfan, I can only think of a few blogs that are hosted on Blogspot that I would consider popular and legitimate. Even among the legitimate bloggers, it seems to be favored by sex bloggers due to the lack of adult content restrictions.

It truly is the perfect site for spam isn't it?

Forrest
Forrest

I've noticed a lot of .edu spam myself lately. Whether it's true or not, there's a perception that they get special loving treatment from search engines

But that's a drop in the bucket compared to the flood of comment and trackback spam I seem to attract. Today a comment got past Akismet from a sex blog on a legit seeming domain. The comment itself was questionable, so I had a look at the site. What's interesting, to me at least, is that the text wasn't just stolen from somewhere else, like a lot of the trackback spam I get; it was pure gibberish. Almost certainly created by "content generator" software. This is a lot less irritating than someone stealing my work ... if it would just stay out of my corner of the network, I'd be happy.

Personally, I've read reports suggesting as many as 93 % of the blogs hosted on blogspot are spam. This always seemed a little shrill ... again, with that perception, I'm not surprised they're cracking down and causing a mass exodus.

Jonathan Bailey
Jonathan Bailey

Forrest: Reading the comments from the black hat seo types, there does appear to be at least some benefit from having an edu domain.

You are right though that the vast majority are coming from other domains. I'm seeing a lot of items from Weebly, Google Groups and your usual Blogspot crowd. It still seems that the "soft" targets are the ones being targeted the most because it is still a battle of quantity over quality.

I seriously doubt that the number ever got up to 93% but you are right that the perception was/is there. Right now, other than Voyagerfan, I can only think of a few blogs that are hosted on Blogspot that I would consider popular and legitimate. Even among the legitimate bloggers, it seems to be favored by sex bloggers due to the lack of adult content restrictions.

It truly is the perfect site for spam isn't it?

Forrest
Forrest

I've noticed a lot of .edu spam myself lately. Whether it's true or not, there's a perception that they get special loving treatment from search engines

But that's a drop in the bucket compared to the flood of comment and trackback spam I seem to attract. Today a comment got past Akismet from a sex blog on a legit seeming domain. The comment itself was questionable, so I had a look at the site. What's interesting, to me at least, is that the text wasn't just stolen from somewhere else, like a lot of the trackback spam I get; it was pure gibberish. Almost certainly created by "content generator" software. This is a lot less irritating than someone stealing my work ... if it would just stay out of my corner of the network, I'd be happy.

Personally, I've read reports suggesting as many as 93 % of the blogs hosted on blogspot are spam. This always seemed a little shrill ... again, with that perception, I'm not surprised they're cracking down and causing a mass exodus.

Jonathan Bailey
Jonathan Bailey

Voyagerfan: Just think of it as butter being spread on bread. It was concentrated in one part and now they're spreading the joy around. Somehow that doesn't feel gratifying...

Jonathan Bailey
Jonathan Bailey

Voyagerfan: Just think of it as butter being spread on bread. It was concentrated in one part and now they're spreading the joy around. Somehow that doesn't feel gratifying...

Voyagerfan5761
Voyagerfan5761

Good news: The site my blog is currently hosted on is becoming slightly less spammy.

Bad news: Spam overall seems to be becoming more and more of a problem.

I'll take the good news, thank you. :-/

Voyagerfan5761
Voyagerfan5761

Good news: The site my blog is currently hosted on is becoming slightly less spammy.

Bad news: Spam overall seems to be becoming more and more of a problem.

I'll take the good news, thank you. :-/

Jeremy Steele
Jeremy Steele

Yeah... we can thank those darn SEOers for causing that little problem... Good to know I'm not the only one noticing it. Thanks.

Jonathan Bailey
Jonathan Bailey

I've noticed it some but the controls on EDU domains pretty much make it so that those are cases where someone has hacked into an established system or is misusing a site's resources. Those are still fairly rare but, yes, they are becoming more common as loopholes and workarounds are being discovered. This is because EDU domains have a high trust in the search engines and are located in very unique IP ranges so they can pass on a lot of search engine juice to other sites.

Jeremy Steele
Jeremy Steele

I've noticed more EDU domains hosting splogs and other questionable spam pages as well, is this something you've seen more of?

Jeremy Steele
Jeremy Steele

Yeah... we can thank those darn SEOers for causing that little problem... Good to know I'm not the only one noticing it. Thanks.

Jonathan Bailey
Jonathan Bailey

I've noticed it some but the controls on EDU domains pretty much make it so that those are cases where someone has hacked into an established system or is misusing a site's resources. Those are still fairly rare but, yes, they are becoming more common as loopholes and workarounds are being discovered. This is because EDU domains have a high trust in the search engines and are located in very unique IP ranges so they can pass on a lot of search engine juice to other sites.

Jeremy Steele
Jeremy Steele

I've noticed more EDU domains hosting splogs and other questionable spam pages as well, is this something you've seen more of?