Spotting Spam Blogs

By Jonathan Bailey • Jul 15th, 2008 • Category: Articles, Personal Experiences

SplogSpot.com Logo

When people find out that their content is being copied without permission, how they seek to handle it is often determined, in part, by whether or not the site is a spam blog.

Where many might be willing to forgive copying by a novice blogger, especially with the promise of a link back, most are not prepared to have their content used so a spammer can trick the search engines and sell questionable items.

This means that, very often, I am forced to make snap judgments about whether a site is a spam blog or not, something that is becoming increasingly difficult as spammers have improved their techniques.

So how does one tell if a blog is a spam blog? The answer is not as simple as it once was but there are still ways one can detect a spammy site.

The Spammer Dilemma

Spammers, over the years, have gotten better and better at making their blogs look human-edited. Though they still can not make their sites appear to be “good” blogs, they, in many cases, can pass off as the efforts of novice bloggers or of non-native English speakers.

This can create quite a problem when approaching a suspected spam blog. Is it a spammer using the default Blogspot template or is it someone new to blogging that doesn’t know how to change the template? Is the strange word choice the result of automated spinning or someone learning English? If the spam blog did its job, it can be difficult to say.

However, most would agree that being heavy-handed with humans who copy, especially those who make some attempt to provide attribution, is counter-productive. Especially when you consider that the person struggling with English may either grow into an important blogger or, worse yet, already be a major figure in their part of the world, it becomes clear why telling humans from machines is important.

But how to do it? There are several different ways, but unfortunately none of them seem to work 100% of the time.So it is important to take all of the methods below into account, look at how spammers beat them, and develop an informed opinion.

PageRank Check

One of my sneakier tricks was to check the site’s PageRank and see if Google had given it either a n/a or a 0. Either would indicate that the site was either very new or had been deemed spam by Google. Either way, it certainly warranted suspicion.

How Spammers Beat It: Tricking Google. This method has become less effective as Google seems to be assigning PageRank to more and more obvious spam blogs. That is a subject for another article.

Turning the Tide: PageRank is still a decent indicator of spamminess, but it is no longer as reliable as it was. It is best to ignore PageRank if you have other reasons to be suspicious of a blog.

“About” Page

Since spammers that use WordPress installs typically spend as little time as possible setting up their blogs, they routinely leave the “About” page, which is created as part of the install, with its default text. Very few human-generated sites have this problem.

How Spammers Beat It: Spammers have started either deleting or filling in the about page. However, those that fill in the page often use it as an opportunity to keyword stuff, often further tipping their hand as a spam blog.

Turning the Tide: If an about page does not have actual information about the site or the owner, it is very likely spam. Some spammers are starting to include fake information, but few seem to be able to resist the opportunity to keyword stuff and link.

Posting Rate

The goal of a spam blog is to get as much junk content into it as possible, as such, spammers routinely have extremely high posting frequency, often well over 100 posts per day. It would be physically impossible for a human to post so much content without the aid of a machine, creating a dead giveaway that the site is spam.

How Spammers Beat It: Some spammers have begun to show restraint, only having their blogs update a few times per day and at irregular intervals, to more closely mimic a human blogger.

Turning the Tide: The content is more telling than the frequency, unless the posting frequency is outrageous. Consider an extremely high posting volume to be a dead spam giveaway but don’t write off a site because it has a reasonable rate.

Formulaic Posting

We’ve all seen the spam blogs that start out with something like “I saw an interesting post today about…” and then proceeds to inject a few keywords and quote from the scraped article. By themselves, these posts may appear semi-legitimate, especially with trackbacks, but are clearly spam when you look at them in group.

How Spammers Beat It: Spammers have started to use multiple post templates in the same blog. However, the limited set means that, if this method is chosen, it is still easily detected over the course of about ten posts.

Turning the Tide: Check and see if the posts have the same pattern, are roughly the same length or all contain quoted material. These are all signs of a spam blog.

Ugly Templates

Sometimes the first sign a blog is spam is the template that it is in. If the template is the default WordPress theme or a stock BlogSpot theme without modifications, it’s a likely tip off of spam content.

How Spammers Beat It: Spammers have been getting better about mixing up their themes. Most spam software applications come with a variety of themes that are rotated and, given the ease with which most blogs can be skinned, spam blogs can be amazingly varied.

Turning the Tide: Fortunately, spammer themes still don’t have any elements of hand-crafting. There are very rarely custom images (or contain only very crude ones), the CSS often looks off, the color scheme is often jarring and the elements many times do not fit together correctly. If you see a glaring mistake that would be caught by anyone looking at the site, it is likely spam.

Domain Names

Spam blogs are typically restricted to three types of domains, 1) .us, .info and other strange extensions 2) domains stuffed with keywords (and often hyphens) 3) Free blog hosts (primarily Blogspot still).

How Spammers Beat It: Spammers are participating in the domain aftermarket, snatching up expired domains that have had sites on them previously. This helps them carry both the PageRank of the old site, in some cases, and obtain a more “honest” name. Spammers are also spreading to other free blog services, including little-known ones, as well as social networking sites.

Turning the Tide: If you are unsure about a domain, use Domain Tools to investigate it. Look specifically for false whois information or other irregularities. Still, most spam blogs are hosted on spam domains. Better ones are too expensive for spammers to buy in bulk and are more profitable at auction than as spam tools.

Ad Excess/Spam Blogroll

Many spam blogs earn their money by framing the content in a slew of ads, generally from one of the public advertising networks. If not, then they often times use the blogroll to put out obviously spammy links in hopes of building PageRank and search engine position for those domains.

How Spammers Beat It: The formula is simple, fewer ads, fewer links, more spam blogs. Spammers have begun to show restraint with both their ads and their outbound links but are creating larger and larger spam farms to compensate. Spammers are also turning to alternate sources of revenue, such as Amazon afiiliate IDs, to better hide their activities. Others will mix “good” links with “spam” ones in their blogroll to further hide the nature of the site.

Turning the Tide: One spam link is too many. Hover over the URLs in the Blogroll and check for any that are suspicious or out of place. When checking for ads, look not so much as quantity, but for the appearance that they were simply “stuck in”. Spammers don’t have time to integrate ads with their site usually.

Conclusions

When looking through these elements, any one of these would make me suspicious of a site’s origin, save perhaps if the site were hosted on a free blog host. Two, in turn, would make it a likely spam blog and three or above would make it a virtual lock.

The bottom line is that, while spammers are not making it any easier to spot their handiwork, it can still be detected by a careful eye (or a not-so-careful eye in many cases).

Though the spammer’s survival depends on staying under the radar and fooling humans and search engines alike, the nature of creating tens of thousands of junk blogs means that sacrifices have to be made and the results will have limitations.

By exploiting those weaknesses, we can continue to detect and stop spam and separate the spammers from those who are just getting started.

Short URL to this Post: http://copybyte.com/z/7v

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

  • Do blogspot *ever* deal with blogs that are reported as spam/scraped? An article from my wife's blogspot blog was copied - pics & all - onto another blogspot blog, replete with all the usual dodgy ads. It's been reported, but six days on, it's still there. I know you've written before about blogspot's lack of interest in keeping their house in order - it's disappointing that they seem to have improved not one iota since then.
  • It depends on HOW it is reported. I have never head of a spam report providing quick results but a DMCA notice generally does. My advice, file a DMCA notice with them and watch the works come down, it may take a week, but it works more reliably than filing a spam report and then waiting for their automated bot to bring the site down after it confirms your findings...
  • Certainly Blogger don't make it easy to complain - their "help" page (not terribly helpful) insists on paper notification via fax or mail - email not accepted. Ironic that they aren't interested in your ID when setting up a blogger account; only when trying to prevent you from protecting it.

    It's now become apparent that a series of themed blogs have stolen further content from my wife's blog. Earlier today, we filed complaints with Adsense; if that doesn't work we'll fax a DMCA to blogger (which I guess is what you meant above - unless there's an email option we've overlooked?)

    Can I say what a disgrace blogger is?

    More power to your elbow, Jonathan - your site has been a great help. Thanks.
  • No, Blogger nor Google in general makes it easy, that much is VERY certain. I agree there. There is a way to send a notice via email though, if you can create a PDF with your scanned signature, you can send it in that way. I do it all of the time and it works well. But it's not something they advertise and you really have to know where to look, in fact, the info isn't even on Google's site.

    Drop me an email if you want more information, I don't feel comfortable publishing the email information right here but will gladly send it to you.

    But yes, I agree that it is a disgrace but I have seen signs of improvement Let's hope they keep up.

    Thanks for the praise of the site!
  • Hi there Jonathan,
    I am not a spammer! I am now a legitimate "novice blogger" as you so quaintly call!
    This post has been very educative and I have found it useful. I have been visiting you blog regularly, but have not been able to make any useful comments for some time now. Now that I found a post, let me compliment you on a very well written one. I am now able to understand the content of your blogs!
    In case my blog id throws, you, this Ramana Rajgopaul. Cheers.
  • Ramana, glad to have you back and even more glad you liked the article! Definitely keep me posted on what you think and if there is anything that I can do to improve.

    Oh, and I don't know if I'd call it quaint... Actually, yes I would...
blog comments powered by Disqus