When people find out that their content is being copied without permission, how they seek to handle it is often determined, in part, by whether or not the site is a spam blog.
Where many might be willing to forgive copying by a novice blogger, especially with the promise of a link back, most are not prepared to have their content used so a spammer can trick the search engines and sell questionable items.
This means that, very often, I am forced to make snap judgments about whether a site is a spam blog or not, something that is becoming increasingly difficult as spammers have improved their techniques.
So how does one tell if a blog is a spam blog? The answer is not as simple as it once was but there are still ways one can detect a spammy site.
The Spammer Dilemma
Spammers, over the years, have gotten better and better at making their blogs look human-edited. Though they still can not make their sites appear to be “good” blogs, they, in many cases, can pass off as the efforts of novice bloggers or of non-native English speakers.
This can create quite a problem when approaching a suspected spam blog. Is it a spammer using the default Blogspot template or is it someone new to blogging that doesn’t know how to change the template? Is the strange word choice the result of automated spinning or someone learning English? If the spam blog did its job, it can be difficult to say.
However, most would agree that being heavy-handed with humans who copy, especially those who make some attempt to provide attribution, is counter-productive. Especially when you consider that the person struggling with English may either grow into an important blogger or, worse yet, already be a major figure in their part of the world, it becomes clear why telling humans from machines is important.
But how to do it? There are several different ways, but unfortunately none of them seem to work 100% of the time.So it is important to take all of the methods below into account, look at how spammers beat them, and develop an informed opinion.
One of my sneakier tricks was to check the site’s PageRank and see if Google had given it either a n/a or a 0. Either would indicate that the site was either very new or had been deemed spam by Google. Either way, it certainly warranted suspicion.
How Spammers Beat It: Tricking Google. This method has become less effective as Google seems to be assigning PageRank to more and more obvious spam blogs. That is a subject for another article.
Turning the Tide: PageRank is still a decent indicator of spamminess, but it is no longer as reliable as it was. It is best to ignore PageRank if you have other reasons to be suspicious of a blog.
Since spammers that use WordPress installs typically spend as little time as possible setting up their blogs, they routinely leave the “About” page, which is created as part of the install, with its default text. Very few human-generated sites have this problem.
How Spammers Beat It: Spammers have started either deleting or filling in the about page. However, those that fill in the page often use it as an opportunity to keyword stuff, often further tipping their hand as a spam blog.
Turning the Tide: If an about page does not have actual information about the site or the owner, it is very likely spam. Some spammers are starting to include fake information, but few seem to be able to resist the opportunity to keyword stuff and link.
The goal of a spam blog is to get as much junk content into it as possible, as such, spammers routinely have extremely high posting frequency, often well over 100 posts per day. It would be physically impossible for a human to post so much content without the aid of a machine, creating a dead giveaway that the site is spam.
How Spammers Beat It: Some spammers have begun to show restraint, only having their blogs update a few times per day and at irregular intervals, to more closely mimic a human blogger.
Turning the Tide: The content is more telling than the frequency, unless the posting frequency is outrageous. Consider an extremely high posting volume to be a dead spam giveaway but don’t write off a site because it has a reasonable rate.
We’ve all seen the spam blogs that start out with something like “I saw an interesting post today about…” and then proceeds to inject a few keywords and quote from the scraped article. By themselves, these posts may appear semi-legitimate, especially with trackbacks, but are clearly spam when you look at them in group.
How Spammers Beat It: Spammers have started to use multiple post templates in the same blog. However, the limited set means that, if this method is chosen, it is still easily detected over the course of about ten posts.
Turning the Tide: Check and see if the posts have the same pattern, are roughly the same length or all contain quoted material. These are all signs of a spam blog.
Sometimes the first sign a blog is spam is the template that it is in. If the template is the default WordPress theme or a stock BlogSpot theme without modifications, it’s a likely tip off of spam content.
How Spammers Beat It: Spammers have been getting better about mixing up their themes. Most spam software applications come with a variety of themes that are rotated and, given the ease with which most blogs can be skinned, spam blogs can be amazingly varied.
Turning the Tide: Fortunately, spammer themes still don’t have any elements of hand-crafting. There are very rarely custom images (or contain only very crude ones), the CSS often looks off, the color scheme is often jarring and the elements many times do not fit together correctly. If you see a glaring mistake that would be caught by anyone looking at the site, it is likely spam.
Spam blogs are typically restricted to three types of domains, 1) .us, .info and other strange extensions 2) domains stuffed with keywords (and often hyphens) 3) Free blog hosts (primarily Blogspot still).
How Spammers Beat It: Spammers are participating in the domain aftermarket, snatching up expired domains that have had sites on them previously. This helps them carry both the PageRank of the old site, in some cases, and obtain a more “honest” name. Spammers are also spreading to other free blog services, including little-known ones, as well as social networking sites.
Turning the Tide: If you are unsure about a domain, use Domain Tools to investigate it. Look specifically for false whois information or other irregularities. Still, most spam blogs are hosted on spam domains. Better ones are too expensive for spammers to buy in bulk and are more profitable at auction than as spam tools.
Ad Excess/Spam Blogroll
Many spam blogs earn their money by framing the content in a slew of ads, generally from one of the public advertising networks. If not, then they often times use the blogroll to put out obviously spammy links in hopes of building PageRank and search engine position for those domains.
How Spammers Beat It: The formula is simple, fewer ads, fewer links, more spam blogs. Spammers have begun to show restraint with both their ads and their outbound links but are creating larger and larger spam farms to compensate. Spammers are also turning to alternate sources of revenue, such as Amazon afiiliate IDs, to better hide their activities. Others will mix “good” links with “spam” ones in their blogroll to further hide the nature of the site.
Turning the Tide: One spam link is too many. Hover over the URLs in the Blogroll and check for any that are suspicious or out of place. When checking for ads, look not so much as quantity, but for the appearance that they were simply “stuck in”. Spammers don’t have time to integrate ads with their site usually.
When looking through these elements, any one of these would make me suspicious of a site’s origin, save perhaps if the site were hosted on a free blog host. Two, in turn, would make it a likely spam blog and three or above would make it a virtual lock.
The bottom line is that, while spammers are not making it any easier to spot their handiwork, it can still be detected by a careful eye (or a not-so-careful eye in many cases).
Though the spammer’s survival depends on staying under the radar and fooling humans and search engines alike, the nature of creating tens of thousands of junk blogs means that sacrifices have to be made and the results will have limitations.
By exploiting those weaknesses, we can continue to detect and stop spam and separate the spammers from those who are just getting started.