Dr. Debora Weber-Wulff, who is both a professor at the HTW Berlin in Germany and the author of the great Copy, Shake, Paste blog in English has announced the results of her 2009/2010 plagiarism checker tests and PlagAware, a little known-service from Germany, has taken top honors.
All in all, the tests put some 48 different plagiarism checkers through 42 different tests, which included English, German and Japanese language tests involving whole plagiarism, edited text, translations and a few originals. Of those 48 systems, 26 were able to complete the tests and earn a final grade.
The final grade was determined by both how well the checker performed on the tests as well as how professional it was and how usable it was in an academic environment, specifically its workflow and how quickly it returned results. The checkers were then grouped into three classes “Partially Useful”, “Barely Useful” and “Useless”.
Since none of the plagiarism checkers were able to score above a 70% on Dr. Weber-Wulff’s tests, none of the services were given a “Useful” score and instead received the equivalent of a C+ on their grade.
However, the test may have also exposed several other problems with automated plagiarism checkers, issues that could directly impact content creators seeking to find a service to track their work.
Problems and Interesting Results
The biggest gap in all the plagiarism checkers was the inability to locate translated plagiarism. While this is widely expected as the technology to make such detections simply is not there, it’s a hole in coverage that has remained since Weber-Wulff performed her first round of tests in 2004.
A more unusual and less-expected gap was the lack of coverage in Google Books. In every checker, 100% plagiarism in Google books failed to return more than 25% plagiarism in the checkers. It appears that the Google API, upon which many of these services rely upon, does not cover Google Books and that makes searching for plagiarism from books very difficult.
Also, umlauts and other non-English characters continued to present challenges to many plagiarism checkers though it was much less the case this time than in previous tests, indicating a better effort to internationalize plagiarism checkers.
Finally, with the spike of new plagiarism checking services, according to Dr. Weber-Wulff, has risen a number of services that appear to be less-than-honest about their intentions, including Viper Plagiarism Checker, which Weber-Wulff suspects is using its plagiarism checking service to harvest essays for its related essay writing service.
The “Barely Useful” category was made up of Plagiarism Finder, Docoloc, CopyScape, Blackboard/Safe Assign, Plagiarisma, Compalitio, StrikePlagiarism and The Plagiarism Checker (Better Known as Dustball).
The “Useless” category was (not linking to these as some are dubious) iPlagiarismCheck, Plagiarism Detector, Un.Cov.Er, Genuine Text, Catch it First, Plagium, Viper, Plagiarism Search, Grammarly, Percent Dupe, Plagiarism Checker and Article Checker.
Dr. Weber-Wulff made it clear that her results are geared toward a very specific usage scenario, namely use in a German university. She also felt that even the most useful checkers were not ideally suited for checking every single student paper submitted, but rather, were useful when a professor had a suspicion of plagiarism and wanted to use an automated system to help track it down.
Still, the results are interesting and they show that smaller companies can, in at least some situations, be better than larger ones for plagiarism detection. The two biggest players, Turnitin and Blackboard came in second and eighth respectively. It also shows that there is a lot of fluidity in the market as Copyscape, the winner of the last round of checks, was in the “Barely Useful” category and was seventh overall.
Primarily though, it shows what I’ve known all along and that is the bulk of plagiarism checkers are garbage. I’ve said as such about some of the “Useless” services including Un.Cov.Er, Viper and the “Barely Useful” service The Plagiarism Checker.
But as interesting as the results are, their application to readers of this site is actually fairly limited. and the reasons are pretty simple.
Limitations and Caveats
Dr. Weber-Wulff made it clear that her research was targeted at one case use, namely that of a German university. However, she did strive to make the results more applicable to other uses, namely by including other languages and various plagiarism types.
Still, readers of this site who are working to track their writing may not want to read too deeply into the results and use them more as a general guide.There are several reasons for this.
- Usage Scenario: There are two types of plagiarism detection, the first is testing a work of unknown origin for authenticity and the second is finding copies of a known authentic work. This test looks at the first scenario where most readers of this site need the second and both require different skills. This may explain why Plagium performed so poorly in these tests, but reasonably well in mine.
- Language: The primary testing language was still in German, even though the test included both English and Japanese checks, the results will still likely skew to those with strong German-language checking.
- Usability Requirements: Many checking their work for plagiarism won’t have the same usability concerns that a professor running through 200 student papers will. So usability issues that sank some of the checkers may not affect you.
That being said, Dr. Weber-Wulff’s tests are definitely a good guide and a good starting point. That’s why I, over the next month or so, will be going through and looking at many of the plagiarism checkers that took top honors in her tests and see how they do in tracking content for the purposes of a content creator.
At the very least, the results are a solid indication as to how well the algorithms work in these checkers and how large their databases are, that alone is reason enough to give them a closer look.
Quickly, I want to thank Dr. Weber-Wulff and her student assistant, Katrin Köhler, for performing these checks. The two of them spent over 9 months performing these checks and are still not 100% done. I also share their hopes that the German government, or another government, might take up the cause of funding these checks in the future so her ability to continue would not be tied to one university with the funding difficulties that come with that.
Though these tests aren’t perfect in that they are not all things to all people, they are important and useful as they provide an apples to apples comparison between the various checkers and that are tested.
And that, in turn, is how I treat the results, not as a gospel on which plagiarism checker to use, but an unbiased test that compares the various services side-by-side in one usage scenario.
When treated that way, the tests become very useful and an important tool in determining which plagiarism checkers to look at.
Photo Credit: c. 2011: Axel Völcker, DerWedding.de