A recent study by WebmasterWorld found that an estimated 77% of all blogs on Google’s Blogspot service were spam. Similarly, AOL Hometown, had well over 80% of its results turn out to be spam. Even MSN Spaces, which as not mentioned in the report, is claimed to host an estimated ten percent of spammer Web site. (Note: See updated information below about the Blogspot study)
It seems as if nearly every major free blog hosting service has been either overrun or nearly overrun with spam. However, one services stands alone, a relative oasis of spam cleanliness, Automattic’s WordPress.com. Despite being just as free as its competitors and placing few restrictions on registration, WordPress.com has not endured the spam avalanche that other services have.
Though there have been spam attacks in the past, the spammers have been easily shut down and, overall, the service remains relatively free of the splogs that seem to choke up its competitors. Though paid services such as Typepad also enjoy a relatively spam-free existance, what WordPress.com does is very rare for a free service.
To find out how WordPress.com achieved this, I emailed Automatic’s founder, Matthew Mullenweg. The answer was very surprising.
A Technological Edge
Automattic, in addition to creating WordPress.com, created the anti-comment spam plugin Akismet.
Akismet works by taking comments submitted to a site and forwarding them to Akismet’s servers. Akismet then, using a series of tests and filters, determines if the comment is legitimate, spam or something in between. Spam comments are filtered out, gray comments are held for moderation and legitimate comments are posted.
Akismet is available for free for personal use on any WordPress blog with an API key, not just those hosted by WordPress.com. To date, Akismet has stopped over one billion spam comments and is used on thousands of blogs, including this one.
Both the WordPress.com site and Mullenweg hint that Akismet is one of their tools for keeping spam blogs off of WordPress.com. Though both are vague with their descriptions as to how it works, one likelihood is that any caught comment spam originating or pointing to a WordPress.com flags the site for inspection.
If true, this effectively turns comment spamming, one of the most popular means of promoting a spam blog, against the person doing it. Comment spamming goes from being a tool to help search engines find a blog to a means for administrators to easily identify the blogs that are likely junk.
That would be an interesting reversal of fortune for spammers and a very intelligent use of a seemingly unrelated technology.
In late 2005, WordPress.com took what some considered an extreme step and banned Google Adsense as well as other advertising networks from its service. As of this writing, there is no way to add any advertisements to a WordPress.com hosted blog, other than “discreet” links, without a paid VIP membership.
This is obviously a tremendous deterrent to spam blogs, many of which rely upon Google Adsense to make money. This is in stark contrast to Blogspot, which makes it very easy to add Adsense ads to your blog and encourages members to do so.
Though Google’s reasons for doing this are clear, it is how they make money from the service, the prevalence of Adsense has undoubdtedly been a major contributor to the deluge of spam that has befallen the service. That is also why Mullenweg, in a comment on TechCrunchxwyxtyvewx, said the following:
We’re considering ad options for the future, but for now disallowing adsense has been a huge help in keeping splogs off the system and hasn’t gotten much pushback from regular folks, only aspiring pro-bloggers. (Who should probably be on WordPress.org anyway.)
While it is true that this has no impact on the spam bloggers that are solely interested in using the splogs to gain search engine ranking via outbound liks, this missing functionality does a great deal to deter many of the laziest and least sophisticated spam bloggers out there.
All in all, while the lack of simple monetization might be a hurdle for some would-be WordPress.com users, it has played a critical role in keeping the service free of spam blogs.
The Real Difference
But while Mullenweg was clearly pleased with the role that Akismet and other tools played in stopping spam, he put the greatest difference on the human element.
According to him, Automattic takes spam very seriously and always has and that, in his view, makes the greatest difference of all. In his email to me he said the following:
If you ever come across something we host that’s spam just drop the link there and someone will look at it within an hour or so.
Akismet and a few other internal tools help, but I think it’s mostly that we take splogs pretty seriously and respond accordingly..
Mullenweg encourages people to use the “Report as Spam” feature in the dashboard across the top of all WordPress.com blogs to report any instance of spam. He says that all reports of spam are tracked and followed up on swiftly.
This, in Mullenweg’s view, has kept spam from establishing a foothold on the service and kept WordPress.com relatively spam-free when compared to its competitors. Hopefully, it will be enough to keep it that way.
The good news in all of this is that it is possible to run a large-scale, popular and free blogging service that is relatively free of spam. The bad news is that there is no magic bullet in any of this.
Running such a service requires a great deal of commitment both from the people who run the service and from the community that uses it. It requires investing both resources and manpower into combating spam while having a genuine dislike for it. It even requires, in some cases, sacrificing features that legitimate users may want in order to make the service less appealing to the spam blogging community.
It also means that it may be far too late for Blogspot and similar services to turn the tide against spam. Though WordPress.com seems to easily be able to keep up with new spam that comes in, it appears that, if over three quarters of your results are junk, that the reversing the tide is all-but-impossible.
However, if Google were to take the simple, but drastic, step of banning Adsense on Blogspot, the effect on spam blogs would be drastic. However, the effect on their legitimate bloggers would be equally dramatic, causing many of them to turn away from the service.
This puts Google, and the other free blogging services, in a very tough bind. In order to effectively combat spam, they need to make sacrifices that will, most likely, cause them to lose legitimate customers as well as spammers. It almost comes down to a choice between being a spam haven and having their entire business model destroyed.
In that regard, spam blogs are like a cancer, often easily treated if caught and attacked early, but incurable if allowed to go on to long. Sadly, Blogger, AOL Hometown and MSN Spaces may be beyond any hope of recovery.
This is an issue I will be revisiting some time later this week.
Update: This article has really taken off. An appearance on Techmeme as well as Matt Mullenweg’s blog have really drawn a great deal of attention to this. So welcome to everyone who is visiting this site for the first time. Feel free to look around some and subscribe to the feed if you wish.
I did want to take a moment and respond to one very astute commenter who pointed out that all is not what it seems with the Blogspot study. As it turns out, the methodology of the study is both buried and confusing. It turns out that 77% of Blogspot blogs are spam for spam-friendly keywords. It is not a reference to the number of spam blogs on the service over all.
However, after thinking about it, I realized that a study of the blogs on Blogspot would be almost useless as Blogspot, in addition to splogs, is choked down with with inactive and abandoned blogs, the same as with any free blogging service or free Web service in general.
A better study would be to look at the percentage of active blogs on the service, something that can be determined, at least with some distinction, but the number of outgoing pings. A study from February of this year looked at exactly that and found that 51% of all pings from Blogspot were spam (Note: According to Pete, this study was taken before Blogspot began pinging new entries by default, it may show bias to spammers as they might be more likely to switch on the pinging feature. We will have to wait and see when a new study comes out.).
This means that over half of all new posts created and pinged out over Blogspot are junk. Though not the 77% mentioned earlier, that is still a tremendous problem. The fact remains that Blogger is, quite clearly, overrun with spam blogs and is unlikely to recover any time soon, not without making drastic changes.
My thanks to Pete for pointing out the error.