Google Answers Question About Plagiarized Content, Kind Of

Jonathan BaileyJune 1, 2022

4 minutes read

Earlier this week, search engine optimization (SEO) expert Muhammad Awais took to Twitter to ask Google’s John Mueller a seemingly simple question: Is there any specific percentage of plagiarism that is acceptable in content?

Mueller’s response was, to put it mildly, curt and direct.

Acceptable by whom? Why not aim for none?
— I am John (@JohnMu) May 30, 2022

Awais tried to follow up with Mueller, but received yet another curt response.

If your site gets most of its traffic from SEO experts, I'd ask them.
— I am John (@JohnMu) May 30, 2022

However, to be fair, it’s probably not a question that Mueller, or anyone else, can actually answer. SEO experts are far from alone in looking for a magical percentage that makes plagiarism acceptable. However, that’s simply not how plagiarism works and, almost certainly, not how Google works.

But the issue of duplicate content has long been a thorny one for Google. Over the years, it’s provided mixed messages around the issue of duplicate content.

In 2006, for example, Google told legitimate authors not to worry about duplicate content issues. This includes having your work scraped and republished on other sites. However, in February 2011, Google launched its “Panda” update, which heavily targeted content farms, including sites that were engaging in scraping.

However, despite those efforts, a study in 2012 found that Google only identified the original content 57% of the time, often ranking duplicates higher than source material.

This has made copied content something of a sore spot with many Google watchers. Though Google claims not to penalize duplicate content, it also tries to have results with varied links. That means, if two pages are too similar, it’s unlikely that both will appear on the first page of results.

This, in turn, brings us to Mueller’s response, as short as it is, it’s a clear indication that Google prefers wholly original content. Though Google has said in the past that it doesn’t penalize quotes or citations, it does mean that, the more original the content is, the better.

This is no surprise to those that have been watching this space. But, for some SEO practitioners, it may be a cause for concern.

Why SEO Practitioners Cares About Copied Content

For people interested in SEO, there are two separate reasons why this is an interesting topic:

Worried About Being Copied: Copying and republishing content has been a “black hat” SEO tactic for as long as there has been SEO. Scraping content, sometimes verbatim and sometimes with automated changes, has been an issue that has lingered over the field for at least 20 years and, though its popularity has risen and fallen, it remains a concern.
Determining How Original Content Must Be: On the flip side of the coin, creating new content takes a great deal of time and Google has a never-ending thirst for it. SEOs often wonder if they can repurpose content, either their own or others, and have it be original. This has taken many forms, including article marketing and spinning.

But while it is clear that Google prefers original content, anyone seeking a bright line rule on how much copied content is allowed is, most likely, going to leave disappointed.

First, from a plagiarism standpoint, there is no magic number where something becomes acceptable. As I pointed out recently, that is not how writing original content works. Changing X% of someone else’s work doesn’t make it yours.

Copyright is similar in this regard. It’s possible for two works to share a great deal of overlapping text and not be infringing. It’s also possible to share no overlapping text and be infringing. Simply put, these aren’t ideas or concepts that hinge upon percentages.

However, even from Google’s perspective, there’s likely no straightforward answer. Algorithms, such as the one Google uses to select search results, are monstrous beasts with countless moving parts.

It, like many other algorithms, can make decisions that even its creators don’t understand why. Google is constantly adjusting and tweaking its systems, but the ultimate decision about what to display takes place is the algorithm’s. Google can tell it what to value and judge the results, but everything in between is comparatively opaque.

This means that anyone interested in an answer to this question is likely to leave frustrated. It’s just not how any of the issues involved work.

Bottom Line

From that perspective, the question was deeply flawed. It shows a misunderstanding of the underlying issues in what is likely an attempt to fool Google into thinking unoriginal content is new.

While I am not someone who studies SEO myself, I get asked similar questions all the time regarding plagiarism and copyright issues. As a result, I’m very used to having to explain that the questions themselves are deeply flawed and that there is a need to rethink the issue.

To that end, I understand Mueller’s response. While it would be nice to have a better understanding of duplicate content issues and how they impact Google rankings, the issue is complicated and nuanced.

Ultimately, we may never find the answers we’re looking for here and the best advice I can give any webmaster is write original content, check to see if your content is being copied and deal with any infringements, especially if they are hurting you in the rankings.

We may not be able to get an answer that works for everyone, but we know how to find the answer for each individual case.