Last week, I received another email from Gideon Greenspan of Copyscape telling me that he and his team had made further tweaks to the system that should, in theory, drastically improve the services ability to detect cases of mass plagiarism.
The service had already shown promising improvements after the first review was so unimpressive. However, even though the product was much better than before the changes, there were still caveats that prevented me from offering an unrestrained endorsement.
So the question became whether or not the new improvements would be enough to eliminate those caveats. I decided to find out and grabbed my test works for yet a third round of Copyscape testing.
Twice As Good
In the first round of improvements, it was hard to do much worse. On the poem Teardrops, Copyscape had failed to find any matches. However, on the second pass, Copyscape’s performance improved drastically to catch 10 of 25 results.
While a drastic improvement, it still resulted in over half of all copies going undetected.
However, with the latest changes, Copyscape more than doubled the number of copies it detected. On the first poem, it caught 31 copies out of a potential 36. On the second poem, Copyscape caught 41 results out of what Google listed as 97 potential results (Note: That number seems incredibly high to even me, I am investigating it currently).
See the results from all three tests blow:
Some of the bump in the third test is due to increased plagiarism on the poem, due largely to one or two people whom I am I am working on resolution with, but even discounting the increased number of targets, the accuracy has gone way up. Where, previously, Copyscape caught less than half of all potential results, it now caught over 86%.
Though the results on the second poem are much worse, about 42% of the potential results, that seems to be due somewhat to some strange behavior on Google’s end with these results.
All in all, it is clear that Copyscape is now catching far more copies than it was previously and that the service is continuing to improve at a very rapid pace.
In a few short weeks, Copyscape has gone from a near-total dud when detecting mass plagiarism to what can now be called an impressive tool. Though further testing is needed to find out exactly how effective it is with other types of works, the improvements are more than obvious. (Note: Copyscape detected all copies of two short stories submitted, however, both only had a few copies on the Web)
Copyscape is a tool and it should be used as such. It should not be relied upon solely but, in its current state, can definitely be used in conjunction with other detection methods such as Google Alerts and Digital Fingerprints. It now seems to have a place in a well-rounded plagiarism detection strategy.
Longer works, clearly, will get more out of Copyscape than shorter ones. That is because the longer the work is, the more likely that the phrase searched for was not included in the plagiarized copy. Also, with longer works, it can be harder to find a good statistically improbable phrase to search for, thus making it easier to let Copyscape do the legwork.
However, if you do use Copyscape, it would probably be best to go ahead and spend the few dollars for Copyscape Premium. The ten result limit on the free search is crippling on works that either have large-scale plagiarism issues or generate a great deal of legitimate reuse.
I set up my account before the first test with five dollars and still have well over half of my searches left. It is probably the best deal available in plagiarism detection.
Any time you use an automated service such as Copyscape you make a trade off. You give up some of the control and effectiveness in exchange for speeding up the process. Copyscape has now hit a point where the trade off is likely worthwhile.
Your mileage may vary and these tests are limited. However, other test results have shown a similar level of satisfaction with Copyscape.
In the not-too-distant future, I plan on doing a thorough analysis, similar to Dr. Weber-Wulff’s to determine how effective the various Web plagiarism tools are at finding all of the duplicate content out there and ranking them accordingly.
Stay tuned for more information.