Earlier today, the Israeli-based plagiarism detection service Copyleaks announced a new $6 million round of funding to help them further develop their product. Though a relative upstart in this field, being founded in 2013, Copyleaks has been making steady progress both in their product and their business.
Much of that success and interest is based upon the fact their product is powered by artificial intelligence. Though the company provides traditional copy-and-paste plagiarism detection, the service aims to detect the “voice” of the author and the meaning of their text, hunting for sources that may match it too closely, even if the text itself isn’t duplicated.
Copyleaks is not alone in this effort. In 2018, Turnitin launched their Authorship Investigation tool, which aimed to help teachers detect when a paper was not written by the same person. The goal there was to look at a particular student’s style and see if a new work matches previous known work by them.
Though Turnitin doesn’t use the term artificial intelligence to describe Author Investigation, the goal is still very much the same, to make plagiarism detection smarter and move past merely matching strings of text.
This trend is also being seen in research, with a recently published study by a group of Saudi Arabian researchers that looked at using AI to look not just at text, but the images in a published paper. The goal was to understand the meaning of the images, as well as the text contained within them, to understand if the concepts and ideas were similar to other published works.
Simply put, plagiarism detection is changing, with multiple groups seeking to make it smarter and useful in a larger number of cases. As such, the technology is entering something of a reinvention as it is adapting both to the needs of today, but also the likely future.
The Need for Smart Plagiarism Detection
For much of the history of plagiarism detection tools, they’ve been fairly straightforward. They’ve been excellent at detecting copied and pasted text, even with some amount of rewriting. While this was in 2000 (and still is today) a very impressive technological feat, users often find they can get similar results using regular search engines and a bit of patience.
For much of the time such tools have been used, their improvements were focused on things such as detecting synonyms or changed words, detecting translated plagiarism, combatting various approaches students used to fool the software and making it easier to use.
However, none of this really expanded the usefulness of plagiarism detection software. It remained a tool for detecting certain types of plagiarism, and, even as it became more intertwined with grading and feedback processes, was still limited in what it could do.
Students picked up on this and began to seek out ways to cheat that couldn’t be detected. Ten years ago, essay mills were low-quality but slowly gaining popularity, today they, along with contract cheating more broadly, are one of the most pervasive challenges to academic integrity.
The idea is simple enough: Plagiarism detection tools can’t (or couldn’t) tell who wrote an essay, just that it wasn’t similar to anything else in their database. As such, even if a teacher might feel that something is wrong, the software won’t flag it.
This, along with the growth of AI writing broadly, has led to a push to try and go beyond text matching. Much as an AI paraphrasing tool or an AI writer attempts to understand the language, plagiarism detection AI aims to understand the words or images in a bid to see if it is similar to other works in a way that goes beyond word choice.
This could be significant, not just as a means of combatting contract cheating, but as a way of understanding authorship more completely.
Expanding the Usefulness
Plagiarism detection tools are, in general, not able to detect if someone steals an idea, well paraphrased or plagiarized in any way that doesn’t involve the copying of at least some text.
This is because computers can easily understand text, but struggle to understand more complex and nuanced elements such as the theme of the piece, the writing style of the author and so forth. This push into AI is to address that limitation.
While some of the more obvious applications of this are detecting essay mill papers and AI-created works, it could also help create better writers. Simply put, a better understanding of how a paper was written, even if it’s not plagiarized, can help instructors guide students in improving their work.
It could also help identify students who are struggling, so teachers can intervene before plagiarism happens. This turns a plagiarism detection tool into a potential plagiarism prevention one.
However, given both the newness of the technology and the unease many educators have about plagiarism detection software in general, such a future may be a long way off.
Instead, the more immediate use may be more research focused, learning how works are written and spotting similarities and influences that even the original author was not consciously aware of.
The goal here is not to find plagiarism, but to understand both the creative and functional processes that go into writing.
In short, where an AI writing tool attempts to understand language so it can mimic it, AI in plagiarism detection tools attempt to grasp language in a bid to understand the process behind it.
If used well, it could be a valuable tool.
In the end, the plagiarism detection scene is evolving. Companies are moving well past copy and paste plagiarism detection and are trying to understand the writing they process on a deeper level.
Though the effort is to detect types of plagiarism that were previously undetectable, such an effort may have additional benefits if the technology matures enough and is accepted enough.
To be clear, the challenges facing companies are huge. What they are attempting to do is mammoth in scale and complexity.
However, AI writing is coming. In many ways, it is already here. The current approaches for plagiarism detection, while still very useful in many cases, need to be examined as the environment around them shifts.
Because of that, we’re entering a very different phase of plagiarism detection, one driven by AI and machine learning. It’s very likely that, in 5-10 years time, the plagiarism detection tools we have will look very different from the ones we have today. At the very least, they will likely have very different capabilities.
Simply put, the industry has gone from one of incremental change to one of drastic change. It is going to be a space to watch for a long time to come.