Back in 2013, I wrote an article about the two types of plagiarism detection tools. While an accurate look at the two major types of tools used to detect text-based plagiarism, it was incomplete and didn’t look at the more important issue: The mindset of the person performing the investigation.
The truth is that, broadly, there are three kinds of plagiarism analyses that get performed, and each calls for a different approach from the person performing it.
While it’s simple to think of “plagiarism checks” as a single, monolithic thing, it is far from it. The lack of clarity around that has led to a lot of misconceptions about plagiarism analyses and, in some cases, has led to some very questionable work.
To reach a better understanding we have to first look at the major types of plagiarism analyses, how they are different and why one might take such an approach.
Type 1: Verifying an Unknown Work
The most common type of plagiarism analysis is an attempt to verify an unknown work. In this situation, we’re looking at a work that may or may not contain plagiarism and trying to determine if it does.
This is what most students go through when they submit an essay, what some publishers do with submitted work and what news organizations do to their reporters.
The mindset here is one of the unknown. Though it is generally assumed that the work is not plagiarized, it is checked in an effort to confirm or disprove that assumption. Regardless, the focus is on the work that was submitted, not the things it was matched against as it is assumed that any matches represent potential issues for the work being examined, not the other way around.
As such, the UI for such tools put the focus on the work being examined. They say what percentage of that work matches content elsewhere and work to make it easy to determine the nature of any copying that was found. The outside matches are only important as far as they have been sources for the working being examined.
Generally, the tools used to perform this search need to have a large database of external content to compare against. The reason for this is simple: Since we don’t know if the work was plagiarized or not, we can’t know where it might be plagiarized from. Since a plagiarism detection tool can’t spot overlaps it doesn’t see, a bigger database means more chances to spot plagiarism.
Still, anyone that has received an originality report or similar analysis of their work is well familiar with this kind of analysis. It is by far the most common and is being performed countless times every day in schools and offices across the world.
Type 2: Comparing a Known Clean Work to Others
Sometimes referred to as “infringement detection”, this type of check does the exact opposite of the first type. It takes a work that is known to be free of plagiarism, usually because it was created by the one performing the check, and comparing it against outside works to find plagiarized (and likely infringing) copies of it.
The tools for performing these checks have many of the same requirements as those performing the first type. They need to have a large, regularly updated database and they need to be able to quickly search through it. However, on the interface side, things are flipped. The interest here is on the matches, not the examined work, and the tool needs to make it easy to examine outside sources and needs to display all results that are found.
With Type 1 checks, there is often a limitation on the number of matches shown. The reason is that, once you get past a certain point, it’s clear that the work was heavily copied and additional matches are superfluous. However, with Type 2 cases, this means missing out on potentially important copies of the work.
These types of checks are less common and, as such, there are fewer tools exist to perform them. That said, tools intended for Type 1 checks can often be used, albeit with some difficulty, to perform these checks. However, it’s often that search engines themselves are the best tools for such examinations as they return all of the results they can, have large databases and are free to use.
One area where there is a lot of growth for this kind of analysis is non-textual copying. Tools such as Content ID and Audible Magic perform these kinds of examinations, but for audiovisual works. There simply hasn’t been as much interest in performing these kinds of checks on textual works. This is true even though, for many, their writing is valuable and created at great cost.
Type 3: Comparing a Suspected Plagiarized Work Against Its Sources
Finally, we have a third type of plagiarism analysis and one that often comes after either a Type 1 or Type 2 analysis: It’s the examination of a suspected plagiarized work against its sources.
This might seem unnecessary. If we’ve done either of the other types of plagiarism analyses, we likely already have some idea about the nature of the overlaps.
That said, there are two reasons why one may want to perform a separate and more direct analysis. The first is because the tools used in other types of plagiarism analyses focus on casting a broad net and this means their specificity often suffers. The second is that not every plagiarism is detected initially with an automated tool as readers often recognize overlaps before a machine ever gets to look at it.
Here, the size of the database is irrelevant as all of the documents of interest are already known. This allows us to avoid worrying about the internet at large and focus on specificity. Tools used for these kinds of examinations often allow you to customize how the automated analysis is performed, letting you drill down much deeper and with much greater flexibility.
However, we are also able to get away from technology altogether and look at the works more broadly. There are types of plagiarism that can be difficult if not impossible for machines to detect. This includes plagiarism of ideas, themes, concepts, etc.
By limiting the number of sources being compared, it becomes possible for humans to analyze these similarities and draw useful conclusions.
These types of checks are extremely common, both as follow-ups to Type 1 analyses, but also as in legal cases. For expert witnesses, these are often the most common types of analyses they are asked to perform. The reason is that such a case is most likely looking at whether the overlaps constitute copyright infringement, meaning we are examining only a small number of works, but in great detail.
Whether you are performing a plagiarism analysis or asking for someone else to do one on your behalf, it’s important to understand what the different kinds of analyses are so you know what you are trying to achieve.
The tools and approaches that work in one scenario do not always work well in another. This can cause you to miss something in your analysis or simply waste your time going down the wrong path.
Knowing what type of analysis you are seeking and what you need to do it is an important first step in performing any examination.
To that end, this guide is meant to give a brief look at the three overarching types of plagiarism analysis. Every analysis is different and brings with it unique challenges. This is merely intended as a starting point for how to look at each case, it’s by no means the final word.
If there’s one thing I’ve always found to be true with plagiarism analyses, every case has a few surprises built into it. As such, it’s always a good thing to be flexible with your approach as the attack you start with may not be what works best in the end.
Despite this, knowing where to start and how to think about the case is extremely useful. If nothing else, it provides a solid foundation to build upon, something every investigation needs.