Using Citations to Detect Plagiarism
As nearly every test of plagiarism detection systems has shown, there a gaping hole when it comes to spotting misuse of works in other.
Modern systems, including those that use fingerprinting and string matching, is that they can only detect copied text. Since the definition of plagiarism, often, goes well beyond mere verbatim copying and includes translated works and even just taking the idea, these plagiarism systems are ineffective.
This came to light in a very stark way earlier this year when a major comparison of modern plagiarism detection systems found that none could detect translated plagiarism effectively.
However, three researchers (Bela Gipp, Norman Meuschke and Joeran Bee, all from
OvGU, Germany and UC Berkeley) may have found an alternative approach that could detect such plagiarisms. This approach involves looking at the citations of the works involved to decide how similar they are and if plagiarism likely occurred (PDF). While it may be somewhat limited in its usefulness, it could be a powerful new tool in plagiarism detection, especially in certain circumstances.
The Basics of Citation Plagiarism Detection
The phrase “citation-based plagiarism detection” at first sounds like any oxymoron as citations are rarely associated with plagiarized works. However, most academic works, including plagiarized ones, come with citations as they are required. The problem is that those citations are often lifted straight from the source material, along with other elements of the work.
However, the citation similarities often remain if the plagiarist modifies the content of the work in other ways. For example, a translation plagiarist might run the entire work through a human or automated translation system, thus completely defeating all text matching applications. However, the citations would largely remain unchanged, offering clues to the misuse.
To test this theory, the researchers analyzed the thesis of former German Defense Minister Karl-Theodor zu Guttenberg, who resigned in disgrace after it was revealed his thesis was a plagiarism, to see how well it detected the various types of plagiarism within it.
The results were that, while citation checking was not able to detect many of the copy and pasted passages, largely because those passages were so short, it did better at detecting disguised plagiarism and translated plagiarism within the work.
In short, where text matching tools only detected around 10% of disguised plagiarism, using citations detected 30% of the known cases. Likewise, automated tools detected less than 5% of translated plagiarism though a citation comparison was able to get over 80% of them.
Considered how highly studied the Guttenberg plagiarism case has been, it was an impressive result, but not one that doesn’t come with limitations.
Advantages and Limitations of Citation-Based Plagiarism Detection
The first and most obvious problem with citation-based plagiarism detection is that it requires citations and lots of them. While it’s fine for a research paper, a dissertation or a thesis, it isn’t isn’t going to work an anything literary or even shorter academic works that only have a handful of citations.
Second, it also won’t work, or at least won’t work as well, in cases where citation overlap is expected, such as cases where there are a limited number of sources. Though citation ordering may help some, it obviously won’t be as compelling as cases where there’s little reason for any citation overlap.
Finally, it also can’t help much in cases of verbatim plagiarism. If the plagiarism is short passages, it simply won’t work. If the plagiarism involves longer passages, then the existing systems are simply more effective.
Still, in situations where it is a good fit, namely cases with a large amount of suspected disguised or translated plagiarism and a fair number of citations, it can work very well and do more than any other approach currently available.
Bottom Line
Citation-based detection is by no means a replacement for text matching and other forms of plagiarism detection. Likewise, it does nothing to actually reduce or eliminate the need for a human evaluation.
Still, citation-based detection is likely the best weapon we have right now against translated plagiarism and against extremely well-disguised plagiarism. In that regard, for these kinds of cases alone, citation-based plagiarism detection could be a great addition to the plagiarist-fighter’s toolbox.
Hopefully we will see some new tools and new services that take this approach. It could give anti-plagiarists everywhere an important leg up in a very difficult area of the fight.
Want to Reuse or Republish this Content?
If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.