Akismet and Spam Blogs
Over the past few weeks, especially since the recent trackback spam attack, I’ve had some time to ponder anti-comment spam technology and how it relates to fighting content theft.
However, a recent post on the Akismet Blog particularly caught my eye. The post, entitled “Is It Spam?” details the shift in spammer tactics and how some users have fallen for it, marking spam comments “not spam” despite the better judgment of Akismet.
However, it was one of the last paragraphs that really got me thinking:
In the case of the pingbacks (the ones that start […]) the spammers are actually stealing your work…
Though that fact is clear to anyone who reads this site regularly, it occurred to me that our aggressive blocking of trackback and pingback spam may be preventing us from identifying spam bloggers and preventing us from shutting them down.
Curious, I decided to check my own spam files and was more than a little surprised at what I found.
The Problem
As wonderful as Digital Fingerprints and other anti-splog features are, trackbacks and pingbacks remain one of the most common ways for a spam blog to be identified.
The reason is simple, spammers, usually in a bid to either appear legitimate or obtain some additional incoming links, often provide links to the original post. Those links, due to the nature of the software they use, cause trackbacks to be sent to the original site and those trackbacks can lead Webmasters right to the people misusing their content.
Another alternative is that, in many cases, bloggers will link to other articles on their site, as I’ve done with this one, and those links often get picked up when the article is scraped. Those links, in turn, produce trackbacks that can be easily followed up on.
The problem is that anti-spam solutions such as Akismet and Defensio aggressively filter out and stop trackback spam. Most Webmasters, just happy they aren’t being inundated with junk comments, never check their spam folders. This means that those trackbacks, which can be very useful in detecting scraping and plagiarism, are often filtered out before anyone, including the blogger, sees them.
Down the Rabbit Hole
Curious to see if this was a real problem or simply an academic issue, I delved into my Akismet spam folder to see what was there.
The sample was relatively small, approximately 1600 comment spams. This is mostly due to me switching back and forth between Akismet and Defensio over the past week and that my blog automatically discards spam comments on posts older than one month.
However, when I did a filter search for “[…]” I found thirteen trackbacks, all of them containing various amounts of scraped content (Note: I found another thirteen in my Defensio folder, which had approximately 900 spam messages).
In every case the site was hosted on a “.info” domain, had scraped a excerpt of one of my stories and introduced it with a generic statement such as “While looking through the blogosphere we stumbled on an interesting post today. Here’s a quick excerpt.” None carried my digital fingerprint.
In most cases, the excerpts were fairly short though, in a few cases, they spilled over into a few paragraphs. In all cases, they were surrounded with advertising from a variety of sources.
Of the thirteen sites in my Akismet folder, about half were down and the remainder appeared to belong to the same spam blog network. However, these are thirteen scrapers I would never have known about if I hadn’t manually reached in and filtered through my trackback spam.
It’s a scary thought, but it makes one wonder how many I have missed up until now and, even worse, how many do I never get the chance to see?
The Good News
On the upside, these particular sites, though definitely scrapers, are not what I would call “high priority”. Since they only reposted an excerpt of the feed and did offer a link back, the damage they can cause is somewhat minimized. However, it is still annoying that this has been going on right underneath my nose and I never would have found out about it had it not been for manual intervention.
The fear isn’t so much that spam blogs like these will stay hidden, but that Akismet might bury someone scraping the full post or even the full feed. That seems to be somewhat rare, possibly an indication that Akismet considers the amount of reuse when analyzing whether a comment is spam or not, but without more Webmasters looking through their Akismet spam, there is no real way to know.
However, none of this is to say that Akismet, or any other spam plugin, is helping spammers out by assisting them in escaping detection. I would imagine the net effect of the plugin is still very bad for the spammers as they depend on these trackbacks and comment spams to build their networks.
Furthermore, any spammer caught up in your Akismet spam folder is likely an ineffective one to start with. If Akismet has already pegged the site as garbage, you can say with little doubt that Google, Technorati and others probably have as well.
Still, it may be worth a few moments to check your Akismet spam folder and see what you find.
Checking For Scraping
The process for checking your Akismet folder for potential scrapes is actually fairly simple. Visit your Akismet folder in your WordPress panel, it can be found under the “comments” section, and then, using the search box up top, type in “[…]”. It should take you to a list with the suspicious trackback posts.
This system is far from perfect as not all trackback spam seems to include that intro, despite it being something of a standard. Unfortunately, neither Akismet nor Defensio offer a means to simply filter spam based upon spam type, this makes the above search, though somewhat simplistic, the best alternative at the moment.
If you are using Defensio and you perform the check, be certain to tick the box that includes “obvious” spam as the majority of trackbacks in my folder, ten total, were labeled as such.
Conclusions
This situation is very frustrating. We, as bloggers, are forced to make a choice between being inundated with comment spam and being able to effectively follow up on scrapers and spam bloggers. Of the two, comment spam certainly seems to be the most annoying, especially considering the ratio of scraping to comment spam, and the most time-consuming to fight.
In short, the time it would take to deal with comment spam without Akismet far outstrips the time it takes to reach into the spam folder once every few weeks and search for suspicious pings.
Still, it is frustrating that Akismet does not offer an easier way to track these cases, either by enabling filtering on spam type or offering a special folder for suspicious blogs.
Though I definitely would rather have the protection than not, it would be nice if these plugins could help us stop other kinds of spam than just comments to our blogs.
For me, I’m going to debate what action to take against these scrapers. Though I certainly can and probably will notify their advertisers, I am unsure about taking additional action due to the nature of the reuse.
In the meantime, I’m encouraging everyone to delve into their spam folders and see what they find. Be sure to let me know if you find anything exceptionally interesting in there. I’ll be eager to hear about what turns up.
Leave a comment below if you want to share.
Want to Reuse or Republish this Content?
If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.