Google Cracking Down on Plagiarists?
In a recent blog post, Google announced that it was cracking down on spam and would be updating its algorithms both to target spammers, but also those who just copy original content (such as plagiarists and other garbage sites) and content farms that pump out a large amount of poor-quality but original content.
According to Google, there has been a slight uptick in spam caused by Google increasing its freshness and its improved spidering but they are working to address it and hope to have it under control soon.
But while the announcement was generally greeted with open arms, I, personally, am much more skeptical.
Though others may think me a cynic, especially when I tell them “I’ll believe it when I see it” when talking about the upcoming changes, I think I have good reason for being skeptical.
Though I sincerely hope Google is able to make a major dent in this fight and would like to be wrong, history is not on Google’s side nor is it on ours as content creators. Sadly, as serious as Google is in its announcement, it isn’t likely to change anything in a significant way and there are many reasons why.
A History Lesson
This isn’t the first time Google’s made a push against spam, plagiarists and other sites that they don’t want in the index. Google has already said that these upcoming changes are extensions of changes made in 2010. This was also the year that saw Google suing spammers (though that involved advertising spam).
But it wasn’t just 2010. In the past five years that I’ve watched Google deal with spam and plagiarism, there has been something of a cycle to it. Every so often, Google makes a major push on this front, sees some progress and then loses the gains over time, often not very long.
The cycle seems to have begun in ernest in 2006, shortly after a contest between it and Yahoo! to have the largest index. That is when Google started talking more about duplicate content issues and started doing purges of its database to remove spam, namely with its “Big Daddy” update.
However, there have been countless other advancements and changes as well. In 2007, for example, Google put more weight on domain age among other things in an effort to stop spam. That was also the year Google got a patent on a system to rank pages by phrase and they updated their spam reporting procedures.
Then, 2008 was the year of the Dewey update, which targeted spammy backlinks and prioritized new content. After that, 2009 was the Google Caffeine update which, in addition to the other changes, was also targeted at spam.
I can go on with the tweaks and pushes, but I think I’ve made my point. I even found one change in 2005, the Jagger update, that was designed to emphasize authority sites and target spam linking (similar to the 2008 update).
In short, the process of tweaking the algorithms against spam and plagiarism has been an ongoing one. The only thing that has really changed is that Google is now announcing the changes beforehand, rather than just making them and leaving SEOs to try and decipher what actually happened.
Other Reasons to Be Skeptical
If the nature of the ongoing war between Google and spammers/plagiarists doesn’t make the point, there are other reasons to be wary of a major victory as Google rolls out its changes.
Consider the following:
- They Already Do a Good Job: If someone plagiarizes your content and your site is reasonably established, odds are they either won’t rank or won’t rank well enough to compete with you 99% of the time or more. It’s the other 1% or less that’s the problem. Improving upon that in a significant way will be very difficult.
- Spam is a Numbers Game: Spammers know the above information well and focus on quantity, not quality. A better algorithm may just be a cause for spammers to put out more garbage to make up for what is filtered.
- Translated/Synonymized Still Tough: If there’s one thing we’ve learned from the recent plagiarism tests out of Germany it’s that translated and other non-verbatim plagiarism is still almost impossible to detect. Since many of the systems tested use Google, it seems likely they haven’t solved this problem yet.
- Crowdsourced Attack: Once any changes are made, the algorithm will be promptly tested by almost every single site on the Internet, spam and legitimate. Any chinks in the armor will be exposed.
- Human Plagiarism is Just Difficult to Detect Reliably: If someone sets up a free, non-spam blog and plagiarizes content, it’s always going to be tough to spot from a search perspective. Considering many sites aren’t crawled regularly, plagiarists can appear to post content first and other factors, such as incoming links, can be misleading. If you aren’t a site that gets daily Google spider visits, this will always be a problem (and possibly if you are).
In short, with current technology, Google can’t eliminate plagiarism from its index or, most likely, do a significantly better job than it is now. Though any improvement will be welcome, this change, when it comes, will likely not be a major blow against Web plagiarism, but rather, just another battle in an ongoing war.
Bottom Line
If you’re looking for Google to keep your content safe and prevent the spammers and plagiarists from benefiting from your work, you’re looking to the wrong partner.
Putting aside Google’s dubious history, they do have all the motivation to help you keep bad guys out of the database, but the technology just isn’t there. They can keep away most, but not all.
We have to monitor our own work and understand how it’s being used (legitimately and otherwise) to understand the impact it is having on our search engine rankings. In short, we still have to take matters into our hands from time to time and be prepared to protect our work as appropriate and necessary.
That isn’t going to change anytime soon and the quicker we stop looking for miracle cures and quick fixes, the sooner we can start finding real solutions with real results.
Want to Reuse or Republish this Content?
If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.