How Web Spam is Evolving

Evolution ImageFor as long as there have been search engines, there have been people trying to game them to get a better position for more popular keywords.

In the early days of the Web, keyword stuffing was the way to go. Basically, a nefarious site would just repeat and heavily use the keywords they wanted to use in hopes that the search engines would think the site relevant and link to it.

Once links became an important metric for determining search engine ranking, link spamming became a popular tactic. This included comment, forum and wiki spamming to get a lot of links quickly (among other tactics).

When search engines caught on to those (and webmasters began to routinely “nofollow” third party links), spammers began to move to move to content marketing, filling up garbage sites with as much content, usually either stolen or automatically generated, in hopes of fooling the search engines.

And so it’s gone. Every time the spammers latch on to a new technique for fooling the search engines (at least some of the time) legitimate sites have been caught in the crossfire. This has included comment spam, RSS scraping and even site hacking as innocent sites are dragged into the spam wars by shortcut-seeking authors.

However, the main thing is that Web spam in 2014 does not look the same that it did ten, five or two years ago. Web spam is evolving and it’s doing so because of the pressures placed on it by the search engines, in particular Google.

That means that the pressures it places on you and your site are different and you need to be aware of the changes.

Out with Article Marketing, in With Snippets

Google recently began to clamp down on various SEO tactics that it considered undesirable. The largest and most controversial of these updates have been the Panda updates, which began in Februrary 2011. Those updates, of which there have now been four, have targeted content farms, article marketing and lots of content similar to other sites.

The basic idea is to locate sites with unoriginal and/or low quality content and bump them down in the results.

For legitimate content creators, this has been something of a double-edged sword. Though tactics such as RSS scraping and plagiarizing articles became less effective for spammers, they also became more dangerous for content creators. If Google correctly determines who the original site is, Panda is a boon shoving the copycats down the results, but when Google gets it wrong, as it does with some regularity, it’s the original site that is dropped harder.

But while plenty of spam sites continue to use long-form content and often acquire that content through scraping or regular copy/paste, other types of spam have been rising in popularity.

These sites focus not on long-form content, such as blog posts, but on snippets. They disguise themselves as “top sites”, “search results” or “best of” lists. In reality though, they’re just spam sites that scraped snippets of content, either from the search results themselves or the sites directly, and are displaying it in a way that looks like a top list.

Spam Sample Image

As you can see, though the site uses content from each of the pages it links to (if the links are valid, which in many cases they are not)

However, this isn’t the only “new” spam tactic that’s gaining popularity. A completely different approach has also been on the rise over the past few years and it’s one that targets legitimate sites in a very direct way.

Hacking and Spamming

Last year, I wrote about how spammers have been hacking legitimate sites with the intent not of shutting them down or defacing them, but posting semi-hidden pages with their spam content.

This content is usually a return to form for spammers, leaning more on long form content, bpth lifted and generated content, with links and other, more traditional, spammer tactics.

These methods work on hacked sites because spammers target sites that Google already knows and trusts. Where such methods would likely fail on a new domain or even one that’s been purchased and repurposed, on a site where the majority of the content is known and legitimate, these tactics can often slip under the radar.

How these hacks happen vary from site to site but it’s safe to assume that there are automated attempts to hack your site taking place almost constantly. Usually these attempts aren’t very skilled or harmful, just looking to pick off as many low-hanging fruit as possible, but sometimes they can go farther, such as with the 2013 botnet attack that shut down many websites, including this one, by overwhelming services.

(Note: Plagiarism Today was not compromised in that attack. It was merely unavailable for a brief period of time.)

Still, more popular sites may be the subject to more pointed attacks. However, even that’s unlikely as site hacking efforts, much like other spam efforts, are numbers games where quantity is more important than quality. It’s better to breach hundreds of lower-end sites than one high-end one.

So, as long as you take basic security procedures, you’re likely relatively safe from this kind of hacking. However, that doesn’t mean you’re safe from your content appearing on hacked sites.

What Does This Mean For You?

As a legitimate, non-spamming webmasters/blogger/creator, you’re likely wondering what this evolution means to you.

There’s definitely a good news/bad news situation here.

On the good side, more sites switching to snippets means less RSS scraping and whole-site or whole-page plagiarism, which were major issues just a few years ago. While those approaches are still hanging around (and are more dangerous than ever), spammers at the forefront won’t be making heavy use of them.

On the bad side, the inbound links from spam sites can still hurt you. Fortunately, you can use the Disavow Tool in Google Webmaster Tools to distance yourself from those spammy links. It’s not a perfect system, but it’s better than the alternative.

Also on the bad site, Google’s continued clampdown on unoriginal content means that, any spammers who are still copying content wholesale can be disastrous, especially to sites that are just getting established and haven’t built up a great deal of trust.

As for what you should do, if you see your site being linked to by spam blogs using snippets, odds are there isn’t any action you need to take. I encourage you to join Google Webmaster Tools and, if the links begin to have a negative impact or produce any warnings, disavow them. Otherwise, there’s not much you can do as a copyright notice isn’t appropriate and reporting the spam isn’t likely to do much good.

For large scale text infringement, it’s best to continue monitoring and removing infringements as needed (either form the host or from Google). However, that’s par for the course for content theft recommendations.

Finally, take reasonable precautions to defend your site against hacking. While you won’t be able to stop a determined hacker, you can definitely prevent yourself from being a victim to an automated system by not using the default username, which is “Admin” on WordPress, using strong passwords, keeping current on security releases (platform, plugins and themes) and considering either two-factor authentication or locking out users who fail to enter the correct password too many times.

A few basic security precaution won’t make you hack-proof, but will keep you from being low-hanging fruit.

Bottom Line

In the early days of the Web, spammers were fairly harmless. While their gaming of search engines often times resulted in legitimate sites being bumped down in the rankings, their efforts didn’t directly impact other sites.

However, the day spammers started scraping RSS feeds and plagiarizing legitimate webmasters, legitimate sites went from being innocent bystanders to being, in the eyes of the spammers, acceptable casualties.

Spammers may not be intentionally hurting your site (though in some cases they likely are) but they certainly don’t care about it either. They don’t care how hard you worked, how great your content is and if you truly deserve that number one slot. All they care about is their own ranking and getting there as quickly and easily as possible, consequences be dammed.

So rather than thinking of web spam as something that happens around your site, you need to think of it as something that happens to your site as well.

After all, when it comes to the search results, what happens to one site directly affects another and, with web spam, that impact for you is far too often being denied the traffic and visitors you deserve.

Want to Republish this Article? Request Permission Here. It's Free.

Have a Plagiarism Problem?

Need an expert witness, plagiarism analyst or content enforcer?
Check out our Consulting Website