Is Plagiarism a Feature of AI?
Earlier this week, Variety reported that the Writers Guild of America (WGA) was in negotiations with the Alliance of Motion Picture and Television Producers (AMPTP). One of the hot topics at the meeting was the use of AI in writing with, according to the original reporting, the WGA was somewhat receptive to.
However, WGA East felt that was a misrepresentation of their views and published a statement on Twitter fully explaining their position.
According to that thread, they were open to any use of AI that undermined “writers’ working standards including compensation, residuals, separated rights and credits.” Instead, they said that AI would be limited to research material and that AI writing “had no role in guild-covered work.”
For those still in doubt, the thread ended with the strongest condemnation of all, simply saying, “To the contrary, plagiarism is a feature of the AI process.”
The WGA’s examination was more about how AI works by ingesting outside works and using them to write new material. The WGA made it clear that they feel “AI software does not create anything. It generates a regurgitation of what it’s fed.”
But, while the WGA was discussing the principles of AI, Tom’s Hardware was reporting on a much more specific example of AI plagiarism. They were testing out Google’s new AI, Bard, and asked it a question about which CPU was faster. Bard ended up plagiarizing from an earlier report from the Tom’s Hardware site.
This plagiarism included using similar language, all the same information and even used the royal “we” to indicate that it had done the testing. Worst of all, Bard didn’t even answer the question fully accurately, missing important nuance and details.
To be clear, this isn’t the only time an AI has been accused directly of plagiarism. Back in October, I looked at allegations that Github’s CoPilot AI was used to “launder” open-source code. Likewise, Getty Images has filed a lawsuit against Stability AI over allegations it has copied chunks of their images, including their watermark, without attribution or permission.
All this begs the simple question: Is AI a plagiarism machine? The current answer is likely yes, but it doesn’t have to be that way.
What Does it Mean to Be New?
One of the thornier questions to sort out when dealing with an AI is whether an AI produces anything “new” or “original”. Some, such as the WGA, argue that, since all AI’s are trained on massive libraries of human creations, they are incapable of creating anything original.
However, one could equally argue that humans function the same way. We’re all trained on countless works created by other people, and we draw upon all that information and experience when writing anything “new”.
That, in turn, is the problem with focusing on what is “New”. The question of what is originality quickly becomes one of philosophy. Furthermore, different creators have different standards of what is new or original. For example, a scientist publishing a paper has a different set of rules than an author of fictional works.
In short, there’s no way that we, in the course of this article, could come up with any universally accepted definition of “new” as it applies to AI work.
However, that’s not very important.
Because, if we assume that all new works are based to varying degrees on older ones, the real issue becomes citation and attribution. Though, once again, those standards vary based upon the type of work, there usually is, at the very least, a standard.
Except with AI.
Currently, none of the available AI systems attribute where they are getting their inspiration or information from, and that represents a serious problem for users of those systems.
The Problem of Attribution
Simply put, if you query any AI platform available, you will get your desired response but there is no indication what information that response was based upon.
Part of this is, most likely, practicality. In many cases, an AI nor its operators may be able to tell where the content originated. Even if they can, it’s likely a very lengthy list of sources that only played extremely insignificant roles.
But then there are cases like the Tom’s Hardware one. There, the AI clearly pulled all the information from just one source and then failed to cite it. That is, until the AI was pressed about the matter in another prompt.
If the AI were a human author, they would be rightly accused of plagiarism if they didn’t include a link or at least a mention. Any author writing like that for a newspaper or a book would likely face serious questions.
To be clear, attribution doesn’t change the calculus much here when it comes to copyright infringement, though copyright and plagiarism have overlaps, with plagiarism we’re looking solely at the citation and attribution of work, not if the use violates copyright.
However, those copyright issues are, most likely, playing a role in ensuring that AIs don’t attribute their sources. After all, attribution is an admission that a particular work was used. If, like in the Tom’s Hardware case, the use is extremely close, it could become a copyright infringement case, with the attribution being the first piece of evidence.
Right now, there is a great deal of legal uncertainty around AIs. AI companies benefit from there being confusion about what sources their technology is pulling from in various queries. Attribution would wreck that uncertainty.
Because of that, it’s safe to say that plagiarism is very much a feature of AI. Even when AI could and should cite its sources, it doesn’t.
However, that is not likely to change any time soon, as the legal norms will likely have to be settled before the citation norms can even be discussed. AI already has a legal target on its back, and attribution would only make that target bigger.
Bottom Line
In February, Noam Chomsky said that ChatGPT was “basically high-tech plagiarism.” Couple that with the WGA statement, the Tom’s Hardware case and myriad of other examples and there’s clearly an issue with citation and attribution when it comes to AI.
Normally, this would be resolved by creating a citation standard for the technology. We’ve actually seen this before. For example, such a standard evolved organically on Twitter. Facebook, on the other hand, took a top down approach.
However, with AI, only a handful of companies CAN provide attribution. Unfortunately, those companies are best served by not providing sourcing. To them, the legal risks are much bigger threats than the ethical ones of citation.
As such, it’s likely that AI will be a tool of plagiarism for some time. However, those using it won’t likely be aware of the issues until it’s too late.
Want to Reuse or Republish this Content?
If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.