Google Addresses Scraped and Spun Content

Jonathan BaileyDecember 1, 2022

4 minutes read

In a recent Google SEO Office Hours video, Dan Nguyen, from Google’s search quality team, answered a pair of questions that directly addressed content that scraped and/or spun from other material.

The first question, at 9:19 in the video, asks, “How should content creators respond to sites that use AI to plagiarize the content, modify it, and then outrank them in search results?”

To that, he answers that such sites are a violation of their anti-spam rules and that they have “many algorithms” that go after such behavior. He said that, if Google does make a mistake and such a site ranks well, to please report it via their spam reporting tool.

In a similar question at 17:05 in the same video, another user asked, “Why Google is not taking action on copy or spun web stories? Can you check on Discover?”

Once again, Nguyen gave a similar answer, but said that they were “aware of these attempts and are looking into them.” However, he added that such sites, generally, are demoted by their algorithms.

While the answers are definitely thin, Google has a long history of ducking these complicated issues. Back in June of this year, Google’s John Mueller addressed similar concerns, but with an even more evasive tone.

This was in stark contrast to what Mueller said in June 2021, when he openly admitted that, on occasion, Google can accidentally rank copied content over original works and encouraged people to file copyright notices. This is especially true in cases where, for one reason or another, Google doesn’t trust the original website.

The questions and Nguyen’s response seems to indicate that this issue is very much still ongoing, despite Google’s best efforts. However, these comments may indicate a slightly different approach to targeting and handling them.

The Never-Ending Fight

Content scraping is a process through which content is automatically grabbed from one location on the internet and put somewhere else. Though this can be done for a variety of reasons, spammers often use it to create clone websites in hopes of drawing traffic.

Article spinning, is a tool that automatically rewrites articles by replacing key words with synonyms. This is often paired with article scraping to try and make the content appear original, even if it is less readable.

This combination, as we discussed in this retrospective, has been around since at least 2004, spearheaded by the then-popular Article Bot software. Google, for their part, has largely viewed scraped and/or spun content as undesirable and worked to either remove it or demote it in its search results.

In February 2011, Google struck a major blow with its Panda/Farmer updates, those updates successfully targeted and demoted scrapers, spinners and other “low-quality” sites. Seemingly overnight, scraping and spinning took a backseat.

However, they never really went away and started to come back into the limelight in the last 2010s and early 2020s. Simply put, for spammers, it was and always has been a numbers game. If Google successfully demotes or bans 99.9% of scraping/spinning sites, then they just need to launch 1,000 sites. That is incredibly easy to do with existing software.

That, in turn, is very much where we are today. Scraping and spinning come back into favor as spam tools and Google struggling to stop it, all the while legitimate content creators are caught in the crossfire.

A Changing Tactic

One notable difference between the way Mueller addressed the issue in 2021 and Nguyen addressed it this year is in the way webmasters should address.

Though the two were talking about different things with Mueller focusing solely on scraping and Nguyen focusing on scraping and spinning, Nguyen framed the issues as a spam one while Mueller focused on the copyright aspect.

This might seem to be a small difference, but Nguyen seems to recognize that much of the problem goes beyond what can be trivially handled through copyright, with many of the spinning sites either stitching together small snippets from a large number of articles or scrambling the content beyond recognition (though still targeting similar keywords).

But while such minute content usage might clear spammers of copyright issues, they are still very much a threat to original creators. Scrapers and spinners can easily create many, many times more content than any original author and, as Google had admitted, can go as far as to beat the original creator in the search results.

Treating it more as a spam problem, maybe Google’s way of addressing how spinning and scraping is evolving, as the reproduced content often looks very little like the source.

To that end, this is similar to the ongoing controversy over artificial intelligence. AI, much like spinners and scrapers, are often seeded with content created by humans. What rights do creators have when, without their work, the new creation could not exist, but it’s not similar enough to the original to be a direct copyright infringement?

Either way, it’s clear that Google is still treating this type of content as highly undesirable. However, it’s clear from the recent pushback, that many creators feel that they are struggling to keep up.

Bottom Line

As someone who got his start talking about scraping and spinning, it’s always strange to see it back in the search engine news headlines. Though the tools have definitely become more powerful and some of the details have changed, the broad strokes are unchanged after more than 18 years.

On one hand, this is frustrating. It can feel like that, after 18 years of work and effort both with the law and technology, little has changed. However, it’s always been a game of cat and mouse, spammer and Google have done this dance for decades.

Creators are just caught in the middle of it. But that does mean that creators should be aware of what is going on and follow the best practices for reporting and dealing with the spam sites that do breakthrough.

For cases where filing a copyright notice is appropriate, that is still likely the best approach. However, in cases where it isn’t, filing a spam report is still an option and Nguyen’s comments seem to indicate that they do take such reports seriously.

In the end, we can only hope that another Panda/Farmer moment is on the horizon. It may have taken more than a decade for the scrapers to recover, but it’s clear that they have, and Google needs to get this issue back under (better) control.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free