Massive Trackback/Comment Spam Attack

By Jonathan Bailey • Nov 20th, 2007 • Category: Articles, News, Personal Experiences

Over the past 96 hours Plagiarism Today, as well as likely other sites, has been the subject of a massive spam attack across a variety of formats and domains.

The attack, which appears to have begun sometime on Friday, has been persistent for the past four days. However, at this juncture, it appears that my defenses are holding fairly well. Fortunately, reCAPTCHA was able to keep all of the spam comments from reaching the surface and Akismet only let about two or three dozen of the trackbacks through. All in all, the vast majority of it so far has been blocked, but more than enough seeped through to obtain my attention.

What appears to have happened is that an affiliate for Sportsbook.com has been created dozens, if not hundreds, of spam sites and sending out massive amounts of comment and trackback spam to promote them.

But while there is little unusual about the technique other then the sheer volume, hundreds of messages at this site alone, there are other elements of this attack that are unusual and may be a sign of what is to come in the future of Web spam.

Anatomy of An Attack

The attack, so far, has involved hundreds of trackbacks and comments spent from a variety of IP addresses across the globe. The wide disbursement of IP addresses seems to indicate the likelihood that the spammer is using a botnet, which in turn indicates that this attack involves many more sites than just this one.

The trackbacks and comments generally are filled with gibberish about gambling, often in a foreign language, and are linked to a variety of posts across a wide range of servers. The only thing consistent about the comments and trackbacks was that all I have seen attempted to bold a passage in the spam using the “strong” tag.

However, what made this attack somewhat unique is the links that the spam messages pointed to. Rather than using the usual mixture of throw away domains and blogspot blogs, the sites were spread all over the Internet including at sites that have not, traditionally, had a major problem with spam creation.

Some of the sites involved included Google Groups, including several international variations of it, Blurty, OpenDiary, GreatestJournal, Viabloga and Multiply.

Even Oracle’s Bugzilla server was the host for one of the spam URLs.

But even though the URLs were spread across multiple services of different types, the resulting pages were very similar. They contained two large images (where possible) that linked to one of several throwaway .info domains. The domains then redirect any clicks to a page at sportsbook.com

Below the images, the sites have several paragraphs of keyword-loaded content about gambling and various sub genres. The gibberish nature of this content indicates that it is either automatically generated or is a another case of a spinning scraping.

Either way, the end result is the same. Pure garbage and tons of spam.

Sample Links

Please note that all of the links below have been nofollowed. These links are designed to show samples of the pages on their various hosts. Be careful when following these links as I can not promise that they do not contain malware and adult content. I can not vouch for the material on these sites.

All links were working as of 10 AM central time on November 20.

Google Groups (FR)
Google Groups (ES)
Google Groups (IT)
Multiply
GreatestJournal (link down within 4 minutes of reporting)
Bugzilla.Oracle.Com

What This Means to Bloggers

Spam is evolving. That not only includes the waves of trackback and comment spam that must be guarded against, but also new scraping sites and plagiarists.

Though we’ve seen a lot of tactic changes from spammers over the past few years, this one indicates a strong diversification. No longer are spammers focusing all or even most of their energy on “high value” targets.

On one hand, this could be a sign that the recent pushback at Blogspot regarding spam blogs could be having the desired effect. However, it doesn’t mean that Google is off of the hook as nearly half of the links came from various Google Groups sites.

In short though, spammers are branching out and services that previously only had a minor problem with spam will likely soon find themselves at the forefront of the spam war. Sadly, these companies often do not have the technology nor the resources to battle against a major spam assault and can not implement effective counter-measures fast enough.

This means that, not only will spammers likely start upping the amount of content theft and spamming they execute, but that it will be across a wider array of sites. Equally bad, blog services that were once trusted and highly-regarded could become spammy neighborhoods in short order as they become overrun by junk.

There is no real advice one can give for this, other than to be advised and be on the lookout. Odds are this spam attack was not an attack at all, but merely an intense salvo in a never-ending war.

Conclusions

As if to further the theory that it is just a salvo, I delved into my blocked spam folder and I see that a second attack has already begun. This one dealing with a check cashing scheme. Many of the same domains are involved, including Google Groups. The pages fit the same formula, though with one image instead of two, and it appears to be the work of the same group or individual.

Things are definitely getting ugly but, this time, the defenses seem to be holding a lot better. Most likely, Akismet has updated itself to deal with this new wave.

Still, the very nature of this evolution is going to make it a difficult one to track and stop. It is only a matter of time before another shift in the formula enables the spammers to break through, if but for a moment.

In the meantime, bloggers and Webmasters are caught in the middle, both having their content repurposed to fill the spammy pages and then fighting the fake trackbacks and comments.

As sad as it is, we’re fighting the war on two fronts and both fronts are shifting. Our tactics will have to change accordingly.

Short URL to this Post: http://copybyte.com/z/2i

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

  • JB
    Tiesto: Indeed that is the nature of the battle. Hopefully the defenses will hold for at least a little while...
  • Tiesto
    I love your analysis of this ugly development. As you rightly stated “very nature of this evolution is going to make it a difficult one to track and stopâ€? because those creeps are not going to stop. They will soon find away to break through any defensive front against the nefarious activities
  • JB
    Ben: reCAPTCHA has served me very well during this attack. I am grateful for it now more than ever. Despite thousands of attempts, no comment spam has gotten through. It's just too bad reCAPTCHA doesn't do trackback spam. I'm seeing a fail rate of about 10 a day on that front with Akismet.

    Speaking of which, I'm now giving Defensio a trial run to see if it can do any better. I guess we'll see.

    To answer your question about Sportsbook.com, the law is actually unclear here. If we follow the CAN-SPAM act, companies advertised in spam can be held accountable by the FTC. However, that law only deals with email spam, not Web spam.

    If we look to copyright law, which makes sense considering so much scraping goes on as part of Spam, there is no clear indication. The DMCA protects hosts and other OSPs, but not advertisers.

    Personally, I would think they have a responsibility here simply because, if they were found to be encouraging or otherwise inducing people to spam, I have no doubts a ticked off judge would have no trouble finding something to hit them with.

    There's no clear law here but, when it comes to something like spam, judges find a way to fold the law to get you.

    If you're going to court, don't be evil...

    Jeremy: Glad it isn't just me getting hammered by this. But Akismet is not holding up. Three more got through just last night. I've decided to test out Defensio and it seems to be working right now.

    Recliners: Glad to tip you off on this one. Sadly, I don't think I'm all that high value of a target, just in the way.

    Amy: Since a fan fiction is a derivative work, it is unlikely that, even if you could find the original author, they could do much about it. Though Paramount is pretty lenient about fan fiction with Star Trek, they're the ones that hold the copyright interest in the work itself and the ones best suited to stop this sale.

    If it is obviously a Star Trek fan fic being sold, contact the terms of use address at Star Trek and let them know about it.

    http://www.startrek.com/startrek/view/contact.html

    They will likely forward it on to whoever needs to see it at the company's legal department.

    Hope that helps!
  • Amy
    Hi,

    Totally unrelated to this post.I have a question. I recently bought an ebook whose premise sounded familiar and I was curious. But when I started reading it, I was very disappointed to see that it was a plagiarised version of a star trek fanfiction that I had read a while back. Scene by scene copied exactly. Even the characters had the same ethnicity.It was a waste of money buying that book. Now I am in a dilemna. Should I mention about this or do something abt it or just let it lie. The fanfiction I read was written long ago and though her website is shut down her story is still posted in some sites. Its just I feel so bad at someone not only taking the credit but making money off some one else's story. Any suggestions?
  • That article was an eyeopener for me. I had no idea that splogging was such a pernicious and malignant menace.

    By the by congrats on becoming a 'high value target'!.
  • So far Akismet has held up nicely for me - only 5 or so made it through and went straight to moderation. Been getting average of 300-500 per day for a few days now, a bit scary considering the previous average (for all time) was only 70
  • Ben Maurer
    I'm glad to hear reCAPTCHA is serving you well here :-). These kind of attacks are where CAPTCHAs really shine compared to Akismet -- Akismet takes some time to react to the initial wave.

    Just a thought here -- does sportsbook.com have any legal obligations to prevent it's affiliates from using spam to promote it's products. Looking at sportsbook.com's website, it seems to be incorporated offshore. However, the servers are located in the US.

    In general, it'd be very difficult to track the people who have this .info domain. However, if sportsbook.com and similar sites where to have the obligation to investigate instances of spam and were further obligated not to pay affiliates who used spam, there would be much less incentive to engage in spamming.

    Obviously, sportsbook.com doesn't *really* care if spam is used. To them, a customer is a customer. The only downside of the spam for them is possible harm to their reputation. I doubt that's really a concern for them :-).
  • JB
    Daniel: Well, it appears I spoke too soon. I should have checked my PR before I said anything. It appears my reeval request has done something, I'm back at a five.

    However, it was a three when the spam attack began. I left a comment on the Blog Herald on the seventeenth. This was a few hours after I had gotten the first trackbacks.

    http://www.blogherald.com/2007/11/17/is-google-...

    This roller coaster is very annoying. Still, I am glad for the good news!
  • JB
    Daniel: As tempting as your theory is, it runs totally counter to what has actually happened here.

    First, my site was many caught up in the recent pagerank shifts. Apparently Google didn't like my blogroll (I've never bought or sold links on PT) and I was knocked from a 5 to a 3. I've request a reevalution from Google but that hasn't gone through yet.

    As a spam target, PT has become devalued due to this. Alexa has held pretty steady, though has always been dead wrong, and there's not much else to be said about PT.

    However, what I find interesting isn't the slam that I got over the weekend, but the URLs that were involved. I think that's the real story here.
  • You are analyzing this all wrong. The increased interest in your site from spammers can only mean one thing: You have gained a higher ranking (PageRank, WebRank, or maybe even Alexa rank) in search engines.

    Which menas links from your site is now very much more valuable. You can usually tell when you gain higher Google PageRank about a month in advance du to increased spam.
blog comments powered by Disqus