comfortable

Akismet and Spam Blogs

Akismet LogoOver the past few weeks, especially since the recent trackback spam attack, I’ve had some time to ponder anti-comment spam technology and how it relates to fighting content theft.

However, a recent post on the Akismet Blog particularly caught my eye. The post, entitled “Is It Spam?” details the shift in spammer tactics and how some users have fallen for it, marking spam comments “not spam” despite the better judgment of Akismet.

However, it was one of the last paragraphs that really got me thinking:

In the case of the pingbacks (the ones that start [...]) the spammers are actually stealing your work…

Though that fact is clear to anyone who reads this site regularly, it occurred to me that our aggressive blocking of trackback and pingback spam may be preventing us from identifying spam bloggers and preventing us from shutting them down.

Curious, I decided to check my own spam files and was more than a little surprised at what I found.

The Problem

As wonderful as Digital Fingerprints and other anti-splog features are, trackbacks and pingbacks remain one of the most common ways for a spam blog to be identified.

The reason is simple, spammers, usually in a bid to either appear legitimate or obtain some additional incoming links, often provide links to the original post. Those links, due to the nature of the software they use, cause trackbacks to be sent to the original site and those trackbacks can lead Webmasters right to the people misusing their content.

Another alternative is that, in many cases, bloggers will link to other articles on their site, as I’ve done with this one, and those links often get picked up when the article is scraped. Those links, in turn, produce trackbacks that can be easily followed up on.

The problem is that anti-spam solutions such as Akismet and Defensio aggressively filter out and stop trackback spam. Most Webmasters, just happy they aren’t being inundated with junk comments, never check their spam folders. This means that those trackbacks, which can be very useful in detecting scraping and plagiarism, are often filtered out before anyone, including the blogger, sees them.

Down the Rabbit Hole

Curious to see if this was a real problem or simply an academic issue, I delved into my Akismet spam folder to see what was there.

The sample was relatively small, approximately 1600 comment spams. This is mostly due to me switching back and forth between Akismet and Defensio over the past week and that my blog automatically discards spam comments on posts older than one month.

However, when I did a filter search for “[...]” I found thirteen trackbacks, all of them containing various amounts of scraped content (Note: I found another thirteen in my Defensio folder, which had approximately 900 spam messages).

Akismet Spam 2

In every case the site was hosted on a “.info” domain, had scraped a excerpt of one of my stories and introduced it with a generic statement such as “While looking through the blogosphere we stumbled on an interesting post today. Here’s a quick excerpt.” None carried my digital fingerprint.

In most cases, the excerpts were fairly short though, in a few cases, they spilled over into a few paragraphs. In all cases, they were surrounded with advertising from a variety of sources.

Spam Blog Sample

Of the thirteen sites in my Akismet folder, about half were down and the remainder appeared to belong to the same spam blog network. However, these are thirteen scrapers I would never have known about if I hadn’t manually reached in and filtered through my trackback spam.

It’s a scary thought, but it makes one wonder how many I have missed up until now and, even worse, how many do I never get the chance to see?

The Good News

On the upside, these particular sites, though definitely scrapers, are not what I would call “high priority”. Since they only reposted an excerpt of the feed and did offer a link back, the damage they can cause is somewhat minimized. However, it is still annoying that this has been going on right underneath my nose and I never would have found out about it had it not been for manual intervention.

The fear isn’t so much that spam blogs like these will stay hidden, but that Akismet might bury someone scraping the full post or even the full feed. That seems to be somewhat rare, possibly an indication that Akismet considers the amount of reuse when analyzing whether a comment is spam or not, but without more Webmasters looking through their Akismet spam, there is no real way to know.

However, none of this is to say that Akismet, or any other spam plugin, is helping spammers out by assisting them in escaping detection. I would imagine the net effect of the plugin is still very bad for the spammers as they depend on these trackbacks and comment spams to build their networks.

Furthermore, any spammer caught up in your Akismet spam folder is likely an ineffective one to start with. If Akismet has already pegged the site as garbage, you can say with little doubt that Google, Technorati and others probably have as well.

Still, it may be worth a few moments to check your Akismet spam folder and see what you find.

Checking For Scraping

The process for checking your Akismet folder for potential scrapes is actually fairly simple. Visit your Akismet folder in your WordPress panel, it can be found under the “comments” section, and then, using the search box up top, type in “[...]“. It should take you to a list with the suspicious trackback posts.

This system is far from perfect as not all trackback spam seems to include that intro, despite it being something of a standard. Unfortunately, neither Akismet nor Defensio offer a means to simply filter spam based upon spam type, this makes the above search, though somewhat simplistic, the best alternative at the moment.

If you are using Defensio and you perform the check, be certain to tick the box that includes “obvious” spam as the majority of trackbacks in my folder, ten total, were labeled as such.

Conclusions

This situation is very frustrating. We, as bloggers, are forced to make a choice between being inundated with comment spam and being able to effectively follow up on scrapers and spam bloggers. Of the two, comment spam certainly seems to be the most annoying, especially considering the ratio of scraping to comment spam, and the most time-consuming to fight.

In short, the time it would take to deal with comment spam without Akismet far outstrips the time it takes to reach into the spam folder once every few weeks and search for suspicious pings.

Still, it is frustrating that Akismet does not offer an easier way to track these cases, either by enabling filtering on spam type or offering a special folder for suspicious blogs.

Though I definitely would rather have the protection than not, it would be nice if these plugins could help us stop other kinds of spam than just comments to our blogs.

For me, I’m going to debate what action to take against these scrapers. Though I certainly can and probably will notify their advertisers, I am unsure about taking additional action due to the nature of the reuse.

In the meantime, I’m encouraging everyone to delve into their spam folders and see what they find. Be sure to let me know if you find anything exceptionally interesting in there. I’ll be eager to hear about what turns up.

Leave a comment below if you want to share.

12 comments
Sort: Newest | Oldest
Melantrys
Melantrys

Someone who understands the title as an invitation...? ;)

Well, glad I could help make you feel better about this. :)

Hm, like I already mentioned in my email I can be such a clueless n00b about things at times (ok, most times), and the times I went and checked the pages out, I never found any way to contact anyone to express my views about my work getting used there.

Melantrys
Melantrys

Someone who understands the title as an invitation...? ;)

Well, glad I could help make you feel better about this. :)

Hm, like I already mentioned in my email I can be such a clueless n00b about things at times (ok, most times), and the times I went and checked the pages out, I never found any way to contact anyone to express my views about my work getting used there.

JB
JB

Melantrys,

It is frustrating, but in most cases you can do something about it. What I've taken to doing is hitting back at the advertisers and trying to cut off the revenue stream. If you can do that, then spammers will have no motivation to take your work.

Still, it is nice to know that I am not the only one seeing those trackbacks, I need to go back into my folder in a bit and see if I have anything new. I try to do it once every few days now.

But that raises the question, what kind of idiot scrapes Plagiarism Today?

JB
JB

Melantrys,

It is frustrating, but in most cases you can do something about it. What I've taken to doing is hitting back at the advertisers and trying to cut off the revenue stream. If you can do that, then spammers will have no motivation to take your work.

Still, it is nice to know that I am not the only one seeing those trackbacks, I need to go back into my folder in a bit and see if I have anything new. I try to do it once every few days now.

But that raises the question, what kind of idiot scrapes Plagiarism Today?

Melantrys
Melantrys

While being here about the issue I emailed you about I did some browsing. :)

Once n00b blog tinkerer me realized that Akismet is effective at catching spam but won't prevent known IPs from coming back to get caught again and again and again, I started sifting through Akismet on a regular basis and to block those IPs manually. (Besides, Akismet occasionally feels a strong dislike against two of my blogging friends and throws them into the spam folder, from where they then need rescuing.)

Recently I have been getting rather a lot of those trackbacks you are mentioning.
The pages range from ones similar to those you described to pages who seem to be Wordpress theme examples. Mostly the latter.
Even so, it annoys the heck out of me to see my words (unimportant as they might be in the grand scheme of things) senselessly scattered all over the place.
Sometimes it seems for every IP I block, there will be two new ones tomorrow...

Melantrys
Melantrys

While being here about the issue I emailed you about I did some browsing. :)

Once n00b blog tinkerer me realized that Akismet is effective at catching spam but won't prevent known IPs from coming back to get caught again and again and again, I started sifting through Akismet on a regular basis and to block those IPs manually. (Besides, Akismet occasionally feels a strong dislike against two of my blogging friends and throws them into the spam folder, from where they then need rescuing.)

Recently I have been getting rather a lot of those trackbacks you are mentioning.
The pages range from ones similar to those you described to pages who seem to be Wordpress theme examples. Mostly the latter.
Even so, it annoys the heck out of me to see my words (unimportant as they might be in the grand scheme of things) senselessly scattered all over the place.
Sometimes it seems for every IP I block, there will be two new ones tomorrow...

JB
JB

Recliners: Very welcome! I'm glad it helped.

RS: I don't know about exhaustive, it didn't take that long to figure out the solution, but it was an interesting puzzle. It would still be very worthwhile to see these plugin makers rise up and address this issue though, for our sakes.

JB
JB

Recliners: Very welcome! I'm glad it helped.

RS: I don't know about exhaustive, it didn't take that long to figure out the solution, but it was an interesting puzzle. It would still be very worthwhile to see these plugin makers rise up and address this issue though, for our sakes.

Recording Studio
Recording Studio

That is a very exhaustive investigation and report. Thank you. It would still be worthwhile for Akismet etc to find out methods to prevent this from happening.

Recording Studio
Recording Studio

That is a very exhaustive investigation and report. Thank you. It would still be worthwhile for Akismet etc to find out methods to prevent this from happening.

Recliners
Recliners

Thanks for delving deep into that problem, your analysis has given me great insight into how malignant this problem is.

Recliners
Recliners

Thanks for delving deep into that problem, your analysis has given me great insight into how malignant this problem is.