One of the most common things I do as part of my copyright and plagiarism consulting service is a plagiarism analysis. However, these are not simple tasks nor are they small projects. Even an analysis of a short work can take several hours to complete, many more if a formal report is needed.
However, when I give proposals on doing a plagiarism analysis, many are taken aback by the time and costs involved. I often get the question, “Can’t you just Google it?” or something to the effect.
The problem is that, while plagiarism detection technology certainly has made it easier than ever to detect copied text, it doesn’t actually make a determination about what is or is not plagiarism. In fact, leaning too heavily on automated plagiarism detection is one of the ways that people make mistakes many people make, creating a very real issue with false positives and false negatives that only a human can sort out.
So what is involved in a thorough plagiarism analysis? I’ll explain below.
The Three Types of Analysis
Typically, there are three different scenarios for which a plagiarism analysis is used.
- A Believed Original Work is Checked for Suspicion
- A Suspicious Work is Checked Against Unknown Sources
- A Suspicious Work is Checked Against Known Sources
It’s easy to see how a single case of plagiarism can actually need all three types of analysis. For example, if a work without suspicion receives a quick analysis though an automated plagiarism detection system, it might call for additional attention and then move into a case where it’s compared against a suspiciously similar source.
However, all three analyses are slightly different. The first, for example, is usually just a quick automated check. The second is a more thorough one, but is against all known and available sources. The final one is a very specific, hands on search that compares two or more works side by side.
That being said, of the three types, only the third is where the question, “Is this plagiarism?” is answered with any meaning.
So, with that in mind, here’s a look at how those analyses are done by myself and, to my understanding, most others who do them.
Doing the Analysis
The reason that one can not simply Google an analysis or punch a work into a machine and get a definite answer is that machines can only detect copying, not plagiarism. However, they don’t even do a good job detecting copying as they can only spot exact copies, not paraphrases or other alterations.
Also, machines can’t distinguish between copying that is likely plagiarized or copying that is mere coincidence. Likewise, it can’t easily check for citations either to find unattributed lifting.
As such, there’s much more to a plagiarism analysis, including, typically, the following steps.
- Automated Analysis: The work is put through some form of automated plagiarism detection. Possibly first a broad one against all known sources and then a more specific one against the one or two suspected sourced. Provides general information on suspect areas, percent of work copied and so forth.
- Citation Analysis: Citations are then checked fort he matching content and any properly attributed work, generally, is removed from consideration. Exceptions include cases of citation plagiarism (lifting citations from an earlier work) and situations where it seems they copied from a source that used the citation correctly first.
- Common Phrases Discarded: Next, since most plagiarism detection systems, especially sensitive ones, report short, common phrases as being matches, sentences that are likely mere coincidence need to be removed.
- Paraphrasing/Rewording Detected: The opposite problem is that plagiarism detection systems won’t detect paraphrased or otherwise altered plagiarism. This makes it critical to read the works involved carefully, using the automated analysis as a guide, to find areas of likely paraphrasing and rewording.
- Other Parts Added/Removed From Consideration: Depending on the nature of the works, other passages may be added or removed from consideration. For example, in cases of law school plagiarism, certain passages might be non-common but required. Those need to be removed. Other signs are also considered such as odd grammar mistakes the works have in common, etc.
- New Tallies Are Made: With all additions/removals made, a new tally is made to decide how much of the work is suspect of being copied without attribution.
- Analysis Made: The determined amount of unattributed copying is then weighed against the ethical standards the work had to live up against to decide if it would likely be considered a plagiarism.
At this point, the findings are usually presented and, if needed, the formal report is drafted.
Needless to say, this process is time-consuming, tedious and requires an intimate level of knowledge with the works. I typically do these analysis with multiple note files, spreadsheets and other documents. Larger projects often require the help or of one or more other people.
Even in cases where the plagiarism is obvious the analysis needs to be done both to prove that it is as obvious as it seems and to show just how deep it goes. Very few cases don’t require this level of work and those usually don’t require an analysis at all.
This, in turn, is why even plagiarism analyses on shorter works can take several hours and longer ones can often take weeks.
Accusing someone of plagiarism is almost never a simple matter. Accusing someone of plagiarism can have dire consequences for one’s future, either academically or in their career. It is not something that should be taken lightly or simply handed off to a machines, no matter how good their “plagiarism detection” is.
Machines don’t understand the complexities of paraphrasing, citation and the general ethics of plagiarism. Those are decisions humans have to make.
This is why true plagiarism analysis requires so much work and takes so long to do properly.
Without that time and effort, the risks of false positives or false negatives is simply too great, all with an accusation for which there should never be any mistake.