Why Amazon is Overrun with Plagiarism and AI Garbage

Jonathan BaileyApril 17, 2024

5 minutes read

Yesterday, Constance Grady at Vox published an article about the author (and fellow Vox employee) Kara Swisher, who recently published her latest book, Burn Book, on Amazon’s Kindle store.

However, shortly after the book went live, new titles that used her name began popping up. This included at least one “biography” of her and other books that featured her name in the title. This included such works as Kara Swisher: Silicon Valley’s Bulldog and Kara Swisher Biography: Unraveling the Life and Legacy

As Grady correctly guessed, the cause of this is likely search engine optimization (SEO) spam. Spammers saw increased searches for Swisher’s name, largely due to her popularity as a journalist/podcaster and her impending book release, and simply created titles to capitalize on it.

The titles were most likely “written” using a combination of AI generation and low-cost ghostwriters/editors. The goal was simply to rank well for Swisher’s name and sell a few copies to customers who were either confused or simply wanted more to read.

The problem is annoying for authors and consumers alike. However, Amazon has proved either incapable or unwilling to address the issue adequately. Their lackluster vetting processes all but rolls out the red carpet for garbage ebooks.

However, this is far from a new problem. The Kindle Store has had problems with plagiarism, low-quality ebooks and spam for over 15 years, well before generative AI. To make matters worse, Amazon has often enabled spam rather than fighting it, making things worse for everyone who uses it.

A Long History of Questionable Books

In early 2009, Amazon made Kindle Publishing for Blogs open to everyone. This service allowed websites to publish their content on the Kindle store and charge a subscription fee. However, there was a glaring error in the process.

In short, Amazon could not verify that the person setting up the site was the owner. Anyone with access to the RSS feed could sign up a site and sell subscriptions. This included fake subscriptions to popular sites like TechCrunch, which were trivial to set up.

Fortunately for websites, the service never saw much use. Amazon quietly discontinued it in August 2019, citing low usage for the closure.

However, Amazon’s lack of concern about copyright or plagiarism would become a recurring theme. It started in January 2012 with an erotica plagiarism scandal that saw dozens of authors complain about their books being reuploaded and sold under new names. Authors in other genres would complain about similar issues over the coming years.

By mid-2016, it was obvious that Amazon and its Kindle Store had a significant copyright and plagiarism problem. One journalist even managed to plagiarize a 2008 book wholesale and become a bestseller on the site.

At the time, the potential solutions were fairly simple. Amazon needed to implement basic plagiarism checks and set up processes for duplicative works. While it wouldn’t have fixed every problem, it would have weeded out the worst offenders.

However, Amazon never took action. Although the issue gained national attention during the pandemic, no major changes were implemented. Amazon’s largest enforcement crackdown of the past few years was against authors they thought had violated the exclusivity agreement, even though they were victims of piracy.

Now, with the widespread availability of generative AI, Amazon is seeing a whole slew of new problems. This includes reverse plagiarism, where spammers upload AI books but pretend established authors wrote them and the aforementioned spam issue, where authors flood the store with keyword-friendly generated books in hopes of confusing readers.

After decades of neglecting these issues, Amazon is becoming overwhelmed with the new wave of spam, and there may not be much it can do.

What Amazon is Trying to Do and Why It’s Failing

To be clear, Amazon has been changing some policies. In September 2023, the site announced that it would limit authors to publishing just three books per day. Months later, in December, the company announced that authors publishing AI-generated content would be required to disclose the use of AI.

Both policies aim to curb the flood of AI writing. However, both appear to be ineffective. Reports of AI books flooding the store continue, with one reporter saying that Amazon promotes the books via Kindle ads. Perhaps most worrisome, many of these ads are for AI-generated children’s books.

The reason for this failure is simple: Amazon’s policies are far too little and far too late.

ChatGPT became publicly available in November 2022. It took Amazon nearly a year to limit the number of books that could be published and another three months to require the disclosure of AI writing. This is a dreadfully slow pace for a company central to the publishing industry.

However, they likely wouldn’t have succeeded even if the policies had been implemented quickly. Limiting authors to three books per day seems reasonable, but it still means that an individual author can publish over 1,000 books in a year.

However, if the barrier is an obstacle, other workarounds exist, including setting up other accounts or using middlemen. In short, the limit will only really impact the worst of the worst when it comes to AI publishing, and even then, likely not much.

The disclosure requirement is also dead in the water. First, it gives tacit approval to publish AI writing on the service, only limiting it when it isn’t disclosed. Second, it’s unclear how Amazon would detect or block it even if it isn’t disclosed.

As we discussed last week regarding Medium’s new policy, AI detection is a fraught space. A large number of tools are available, but their effectiveness varies wildly. Furthermore, since it’s unclear how AI detectors reach their conclusions, verifying or disproving the findings is impossible.

In short, the policy doesn’t do anything to stop AI writing, and even when it is violated, it’s unclear how it will be enforced. Sadly, it’s largely meaningless.

That, in turn, neatly summarizes the Amazon situation: policy changes that come too late and have no teeth.

Bottom Line

Of course, none of this should come as a surprise. Amazon has a 15+ year history of ignoring and enabling spam, plagiarism and other garbage work in the Kindle Store. The company has failed to take even the most basic precautions to prevent copyright infringement, plagiarism and spam content.

Why would things meaningfully change with AI? Not only is this a much more difficult problem for Amazon, but the company has invested billions in AI systems. Clamping down on AI in a meaningful way would not only be incredibly difficult, but it would be against their business interests.

But those same business interests have always been in the way. Putting up guardrails against plagiarism would mean spending money on systems and people to run them. All that would be achieved would be removing books from the store, and Amazon would get the same royalty if the book is original, a work of plagiarism or AI-generated.

Amazon doesn’t have a reason to care, and they likely won’t have a reason until there’s meaningful competition. Until then, expect the Kindle Store to continue featuring AI garbage, plagiarized books and lots of other spam.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free