Back in February, hot on the heels of the Christiane Serruya plagiarism scandal, we took a look at how Amazon could fix its massive plagiarism problem.
The idea was fairly straightforward. Simple, cursory plagiarism checks as works are submitted for publication would help detect a large percentage of would-be plagiarists and would discourage the practice more broadly. However, we noted that such an effort would likely still require a massive human investment that Amazon is likely either incapable or unwilling to make.
Unfortunately for Amazon, a recent report by David Streitfeld at the New York Times highlights that the problem goes much deeper than plagiarism. Counterfeit books, AI-generated biographies and “bait and switch” titles are also rife on the service.
According to the article, this is largely owed to Amazon’s aforementioned “hands off” approach to its store and that it assumes all of its sellers are operating in good faith until an issue is brought to their attention.
Amazon has responded to the story and said that they have invested over $400 million in personnel and tools to stop fraud and abuse. However, Amazon did acknowledge the complaints to a degree saying that “The Times cites a small number of complaints and we recognize our work here isn’t done. We will not stop until it’s zero.”
But the truth is that it’s nowhere near zero. According to the Authors Guild, counterfeiting on Amazon is seeing a “massive rise” and Amazon itself has acknowledged the issue, noting that counterfeiting is a risk factor in financial disclosure forms.
There, they said they “May be unable to prevent sellers in our stores or through other stores from selling unlawful, counterfeit, pirated, or stolen goods, selling goods in an unlawful or unethical manner, violating the proprietary rights of others, or otherwise violating our policies.”
So what is going on? How is Amazon spending millions and having the problem only grow? The reason is that Amazon has built a platform that’s too big to police and it shares that honor with another site we all know very well.
The YouTube Problem
As we discussed back in May, YouTube has become almost as well known for its copyright failings as it has for being the host of videossqedxdswdruvcvwr. Its Content ID system routinely makes mistakes and flags non-infringing content while, at the same time, it is still very easy to find infringing videos on the site.
But the problem goes far beyond copyright. YouTube has become a haven for a wide variety of objectionable content including hate speech, terrorism, conspiracy theories, sexualization of minors and much, much more.
YouTube has responded to these issues in much the same way that it’s responded to copyright issues, with a mixture of technology and policy changes. Whether it’s demonetizing certain kinds of content, using algorithms to remove unwanted material or applying age gates to hide certain content from minors.
However, all of these approaches have one thing on common: Humans are never the front line of defense.
The reason is quite simple, they can’t be. With an estimated 500 hours of video being uploaded to the site every minute, there’s simply no way. Even if only a fraction of a percent of all videos have copyright, community guideline or other issues, (which would be phenomenally low) that’s still far more than can be done with humans.
So, YouTube uses bots, like Content ID, to detect and stop most of the problems. According to YouTube, Content ID handles about 98 percent of all copyright issues on the site. Even if we assume that Content ID is 99% perfect, that’s still a large number of videos that require human intervention.
And this is an issue that exists across all kinds of unwanted content. Even if the bots are 99% perfect and handle 98% of the problems, there’s still a large amount of content that needs manual policing and YouTube is struggling to keep up.
For example, if there are 500 hours of content being uploaded to YouTube every minute, that’s 3,000 hours per hour. If just 5% of that is copyright infringing or has some kind of copyright issue), that’s 1,500 hours of content with copyright issues. If Content ID handles 98% of that, that leave 30 hours of content that’s dealt with another way. Of the 1470 hours that remain, just 1% was flagged in error, creating another 14.7 hours of content that requires human intervention every hour.
If we assume the average video is 4 minutes and 20 seconds long, that works out to about 619 (Correction: Original post said it was 69, not sure if my math or typing failed, both are equally atrocious. Thank you Stephen for the heads up!) videos per hour that require human intervention on copyright issues. That’s more than one per minute and that’s JUST looking at copyright. That says nothing about other community guidelines issues.
Even with some admittedly very impressive technology, YouTube is struggling to keep up with enforcement on the human side. There is so much content that even what falls through the cracks is drowning them.
YouTube cannot keep up and it can’t program its way out of this problem.
Too Big to Fail, Too Big to Police
The problem is pretty straightforward. Amazon and YouTube, as well as other sites like Facebook and Twitter, pushed for rapid growth and they got it. Along the way, they created some pretty cool things and changed the world.
However, they also created platforms that are functionally impossible to police. Even after applying technology, abuse is simply too rampant to stop. Once humans are required, they are outmatched.
This means that, if you’re a bad person wanting to do bad things on one of these sites. Your war isn’t with the terms of service, the humans that run it or even the law, it’s with the technology enforcement layer. If you can circumvent Content ID on YouTube or Amazon’s algorithms you’re likely fine for a lengthy period of time.
Even if Content ID only misses half a percent of what it should have caught, using the guesstimates above, that’s still 7.5 hours of infringing material being successfully uploaded every hour of every day.
To make matters worse, legitimate users are playing the same game. Those that use copyright-protected content in a way that’s compatible with fair use are fighting the same battle as pirates. It’s not one against the law, but against the technology.
To be clear, this doesn’t just apply to copyright. For example, history channels about World War 2 often get demonetized and not recommended because they discuss Nazis and various LGBT videos have been demonetized due to alleged sexual content. While YouTube doesn’t owe a platform or monetization to anyone, these are obvious examples of legitimate channels getting trapped by bots.
That, in turn, is the exact problem. When you have enforcement that’s run by bots and no hope of adequately handling the inevitable mistakes, legitimate users get caught in the crossfire and plenty of actual bad guys escape.
While there’s certainly reasons that Amazon, YouTube, Facebook and others might not want to tightly police their walled gardens. But, even if they did, there’s serious doubt that they ever could.
They’ve simply created a world that’s too big, too open and too lawless to ever hope to effectively police. The tools they have are never going to be up to the task.
Sadly, there’s no easy answer to this problem. Tech companies are inevitably going to look at this as a tech problem, something to be solved with better algorithms or other software.
However, it’s not a tech problem. It’s a numbers problem. Even if the technology were 10x better, there would still be an unacceptably high amount of unwanted content and, to make matters worse, those gains will be eliminated as the platform grows and users, both legitimate and non-legitimate, would seek ways to game the system.
Though I am generally very pro-technology, one criticism that tech companies routinely face is that build mammoth world-changing technologies, grow them to a mammoth scale and think about/deal with the consequences later.
Here that criticism is certainly very warranted. Amazon and YouTube have both created mammoth platforms that are virtual monopolies in their fields. However, they’ve both proven incapable of fully dealing with users not operating in good faith and that problem has only grown as they have reached the scale they are at.
Tech companies have built a problem that tech alone cannot get them out of. To make matters worse for Amazon, they seem to recognize this but have instead put the obligation on creators, launching Project Zero which gives brands the authority to immediately remove allegedly infringing listings.
Amazon knows they can’t solve these problems and, rather than rethinking their structure, are instead passing it off to brand holders. Still, that may be better than YouTube, which seems to be fully in denial.
In the end, there’s no easy or good answer to this problem. That platforms that have been built can’t be effectively enforced and that is unlikely to change any time soon.
The question isn’t if, but how, these platforms will change with that fact.