A.V. Club’s AI Reporter Plagiarized IMDb
Back in June, G/O Media, the company that owns A.V. Club, Gizmodo, Quarts and The Onion, announced that they would be experimenting with AI tools as a way to supplement the work of human reporters and editors.
However, just a week later, it was clear that the move wasn’t going smoothly. A report by Mia Sato at The Verge highlighted issues with the transition, including wildly inaccurate articles and even an admonishment from the Gizmodo union, which asked readers to not click on articles credited to the AI.
Now, several months later, it doesn’t appear that things have improved. If anything, they might have gotten worse.
The reason is highlighted in a report by Frank Landymore and Jon Christian at Futurism. They compared the output of A.V. Club’s AI “reporter” against the source material, namely IMDB. What they found were examples of verbatim and near-verbatim copying of that material, without any indication that the text was copied.
To be clear, G/O does have a license with IMDb to use content from the site. Though the exact terms of the license are unknown, G/O does have some degree of permission to use IMDb content.
It’s also worth noting that the articles do disclose that the information comes from IMDb. The articles in question have a note that reads as follows:
“This article is based on data from IMDb. Text was compiled by an AI engine that was then reviewed and edited by the editorial staff.”
However, as noted by the Futurism report, that text does not indicate that any text is copied. Only that “data” is used. The text is supposed to be “compiled” by the AI and then “reviewed and edited” by humans.
This raises the simple question: What value is the AI “reporter” adding in this scenario? Also, what do we do when an AI reporter commits an ethical violation that would get any human reporter disciplined or fired?
Understanding the Allegations
The allegations themselves are remarkably straightforward. According to the report, the reporters examined articles written by A.V. Club’s bot and compared them to the source material.
Take, for example, this A.V. Club list of movies with the NC-17 rating. This is how it described the movie Young Adam.
A young drifter working on a river barge disrupts his employers’ lives while hiding the fact that he knows more about a dead woman found in the river than he admits.
Compare that to IMDb’s description of the film:
A young drifter working on a river barge disrupts his employers’ lives while hiding the fact that he knows more about a dead woman found in the river than he admits.
In short, they’re identical.
Though Futurism did find at least one other example of verbatim copying, there was also near verbatim copying, including this description of Jessica Frost that A.V. Club used on its list of August movies:
A young woman tries to discover why a time-traveling psychopath is after her, leading to a journey through the desert, time, space and her family’s past.
IMBb’s version is longer, but still very similar.
A young woman searching for the truth about why a time-traveling psychopath is after her, is thrown into a turbulent journey through the desert, time, space and her family’s past.
In both A.V. Club lists, there is no additional text or framing beyond the movies and the descriptions, which are all based on IMDb descriptions and, as seen in this case, sometimes copied directly or nearly directly from them.
There’s not much doubt that this is plagiarism. Though A.V. Club acknowledges that the “data” came from IMDb, it doesn’t indicate that the language does. There are no quotation marks, no blockquotes, nothing to indicate that portions are copied verbatim or near-verbatim.
A human reporter who did this would face serious consequences. In fact, just yesterday, I reported on a similar plagiarism committed by a reporter at the Philadelphia Inquirer that resulted in the reporter being fired.
However, for the AI reporter, no reckoning is coming. G/O made it clear in their original announcement that they anticipate “errors” like this and, though they said they will work to correct them, (though they haven’t changed these descriptions) they are apparently acceptable and anticipated issues.
When the AI Reporter Goes Rogue
The issues using AI in journalism are already well known. Back in January, I wrote about CNET’s AI issues, in particular how factual errors in AI reporting mixed with allegations of plagiarism and unclear bylines to prompt a strong reader backlash and force CNET to “pause” the program.
In June, CNET revamped its policies around AI, scaling back the use of it and saying that stories would not be entirely written using AI. This included editing and correcting more than half of the 70 stories that their AI reporter had published before the pause.
In that light, the latest report is neither shocking nor surprising. AI is deeply flawed, prone to errors, routinely commits plagiarism and generally creates low-quality work, especially in a journalistic environment.
None of this is a secret. All of this is well known, well-understood and backed up with both hard data and mountains of anecdotal evidence. Companies that continue to lean into using AI in journalism cannot feign ignorance.
But we’ve seen this before. Benny Johnson, for example, is an irredeemably unethical reporter with a history of plagiarism, fabrication and other ethical issues that resulted in him being fired from multiple publications.
Yet, he’s never been left wanting for a job. Publications know that, because of his name, he will draw clicks and engagement. There’s simply too much money to be made. Even in journalism, ethics have a price.
From a business perspective, AI is not very different from Benny Johnson. Though the flaws and integrity issues are well known, the allure of a free reporter who can generate countless articles at the push of a button is simply too great to ignore.
But in there lies the problem, if you want AI to function like an actual reporter, it has to be edited, fact checked and plagiarism checked just like a real human.
However, when one does those checks, the errors quickly become apparent and fixing them often takes more time and resources than just starting with a human author.
In short, using an AI in a way that helps a company earn/save money means accepting that the factual errors and plagiarism are just part of the deal. It means completely forgoing journalism ethics, just like hiring a reporter like Benny Johnson.
Right now, for a publication, there is no ethical use of AI that is not either unprofitable or extremely limited. These “experiments” in AI are not about testing what the bots can do, but about seeing how much they can still lower their ethical and quality standards and still find an audience.
Bottom Line
For publications, the question of AI, at least as of right now, cuts to the questions of both quality and ethics.
If you want to do reporting that is both ethical and of a high standard, you can not use an AI reporter.
Even if AI advances and, some day, is able to generate great reporting at the push of a button, it is clearly not there today. Plenty of “experiments” have highlighted the quality and ethical issues of AI-based reporting. This is a known problem.
Any experiments in AI reporting need to be viewed as what they are: Experiments in seeing if increasing quantity while reducing quality and ethical standards can generate more profit.
Sadly, as we’ve seen before, those experiments are likely to work. However, we should at least be honest about it.
In the end, this isn’t about the mistakes of an AI reporter or the issues A.V. Club or other sites have had when trying to edit and correct AI-generated content. It’s about the future of journalism and whether quality and ethical work is valued.
In that light, the arc of history has been pulling publications toward larger quantities of lower quality content for some time. AI is just the latest escalation in that trend, and one that publishers are unlikely to ignore.
Even if it destroys their credibility.
Want to Reuse or Republish this Content?
If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.