I would like to take an aside and delve into a related topic that has been on my mind for the past few months: Comment spam.
Though it doesn’t have much to do with content theft, I have several reasons for wanting to cover this. First many of the RSS scrapers and spam bloggers also use this technique to supplement their work. Second, in some cases, the spammed comment contains scraped content, either from your site or others making it an infringement. Finally, it is an issue that is dear to the target readers of this site, bloggers and Webmasters.
Though WordPress’ reputation against spam blogs has been almost impeccable, it has proved to be very vulnerable to comment spam. This has given rise to an entire cottage industry of anti-spam plugins and most of them, in my experience at least, have been ineffective.
This lead me, about a month ago, to disable comments on all old posts. However, I have since backed away from that position because, among other reasons, it simply was not working.
The most effective comment spam plugin I know of, Akismet, is made by Automattic, the operators of WordPress.com. It is a generous gift to the community and it comes at what must be great expense to Automattic since it works by letting their servers filter the millions of comments that get submitted. However, it is not perfect, by Automattic’s own admission, and it does not stop the comment spam from going through, just from appearing on the site.
Unfortunately, WordPress’ problem with comment spam runs much deeper and it renders nearly 99% of all anti-spam methods useless. However, a change on the backend could, potentially, fix that and change the comment spam game forever.
How a Comment Gets Posted (and The Problem With It)
A comment in WordPress works just like any other form.
You most likely have a comments.php file in your template that represents the actual comment form. That is embedded into your post pages via a template call. The comments.php form, upon submission, sends the comment to another file wp-comments-post.php, which sends the comment up the chain of commands and, eventually, places it in the database.
It is a simple form that works like any other. It is also the same process as when you send an email via a Web form or post a forum. However, the problem is that, with Wordpress and other spam-prone applications, the backend does not know what the frontend is doing.
Basically, with the default install, wp-comments-post.php has no way to confirm that the comment has come from comments.php or anywhere else on the domain.
Spammers, being the clever lot that they are, simply started calling the wp-comments-post.php without ever visiting the site itself. They simply call the file with a specially-formatted address and, magically, a comment is submitted though the bot never set foot on the actual post page.
This is bad news for Wordpress users as nearly all spam counter measures rely on modifications to the comments.php file to work. This includes most captchas, spam questions and even some comment disabling plugins. The spammer simply bypasses those measures, leaving only post-submission filtering to weed out the junk from the real comments.
Though, on most sites, that is a fairly effective approach, sites with large volumes of spam, such as this one, might find it unacceptable. Not only does it mean that some spam is destined to escape the filters and go live, but it can put a strain on the sever, even if, as with Akisment, most of the filtering is done elsewhere.
Furthermore, if email spam has taught us anything, filtering systems are prone to the “better mouse” problem. If one clever spammer finds a way to game the system, the hull will have been breached and all could be flooded.
How Bad Is It?
The problem is rampant. Consider this screenshot taken from my own site stats yesterday.
You can see that the wp-comments-post.php file is the fourth most called file on my server (Note: Both share-this.php and the ajax-edit-cooments files are often called multiple times in a single page, thus why they are so high.). A quick check of the comment count shows that there is no reason for that to happen.
There are hundreds of hits per day on that file, most of which never access the site itself.
Another is to edit your .htaccess file to block visitors from accessing the wp-comments-post.php file without first visiting your domain. I implemented this myself on Plagiarism Today but, while my comment spam volume decreased some, it did not stop. Spoofing a referrer is pretty trivial and it seems that most comment spammers are already doing that.
Yet another hack involves increasing the time between comment submissions, a method that works to stop spammers that “flood” your comments, but does nothing to stop spammers who post once and then come back at a random time later to post again.
One final method, which I ran across some time ago but have been unable to locate again for this article, involved inputting code into the comments.php file that would then be verified by the wp-comments-post.php file. Though it was a messy edit that involved hacking both files, it would have been, theoretically, effective. Once I locate the hack again, I will try it and see if it does indeed work.
In the end though, short of hacks and server alterations, there is no way to prevent this kind of injection. Since almost all plugins deal only with the comments.php file, there is no simple way to effectively block this kind of abuse.
Fixing the Problem
This problem is not unique to WordPress by any stretch of the imagination. None of this should be taken as a criticism of Wordpress or its developers. This problem is present on other blogging platforms, message board applications and nearly anything that accepts input from the outside world and posts it to the Web. Wordpress merely happens to be what I use and what I am most familiar with.
That being said, there needs to be a fix for this problem. There needs to be some way for the backend, wp-comments-post.php, to ensure that the comment actually came from the frontend, comments.php.
One solution involves using a generic anti-spam question in the comments file but then hacking the wp-comments-post.php file to die if the answers to not match. Thus, anyone calling the backend directly without knowledge of the question would get an error.
However, a static method, like the one described in the post, could be easily beaten by a spammer just adding the variable to their software. A more random implementation, such as the one described in the comments, would provide more protection but could still be figured out if needed since computers are very good at math.
I am not a programmer, but what seems to be needed is a means for the two files to handshake with one another in a way that a spammer can not crack. One example might be to create a hash of the comment using a key that exists only on the server. Another would be to use a pseudorandom variable such as a random number generator, the time on the CPU clock or anything else the two files could share. Another idea would be to have the backend check the WP log and ensure that, at the very least, the IP address involved visited the post page in question before commenting.
(Note: The above suggestions are offered “off the cuff” and probably would not work. Please post suggestions and ideas in the comments.)
This would not be easy. It might require rethinking the entire comment posting process, but certainly there has to be a way to at least improve the situation so spammers can not, with easy, abuse the system.
I am open to any and all suggestions on the process. Please comment below if you have any thoughts.
Some Brief Good News
I did, recently, run across some good news in this fight. I installed reCAPTCHA on my blog a few days ago as an experiment. Though it didn’t stop the flow of spam comments, it did improve Akismet’s accuracy greatly.
It appears that, for whatever reason, Akismet has an easier time dealing with comment spam when it comes almost solely from the backend. Since I installed reCAPTCHA, I have not had any spam comments go live or enter the moderation queue.
I plan to continue the experiment for at least a few more days to see if that trend continues.
(UPDATE: I just received an email from Ben Maurer, the tech lead on the reCAPTCHA project, he said that reCAPTCHA counts the spam as it eats as spam in Wordpress, that could explain why Akismet seems to be so accurate. Still, what intrigues me most is that no spam has gotten all of the way through. It seems logical that reCAPTCHA is blocking the spam that actually uses the form, which was the spam getting through from time to time, while Akismet easily handles the spam directly injected through the backend.)
(UPDATE 2: As my education on reCAPTCHA continues, it appears that the plugin DOES validate against comments injected into the backend. That officially makes this my favorite anti-spam plugin.)
Closing this backdoor will not be easy nor will it obliterate comment spam. However, channeling it through the traditional forms makes it possible to apply various Turing tests to weed out the bots.
In short, it won’t put an end to comment spam or replace filtering, but at least it will add an extra line of defense.
Right now, WordPress users are just one clever spammer away from a tidal wave of spam. If someone can find a way to beat Akismet and other spam filtering plugins, there is no backup plan.
Perhaps now, while the situation is somewhat in hand, it is time we started working on one.