When it comes to tracking content across the Web, Copyscape is, for the most part, the brand name to know.
This reputation has been very well earned. They recently took top honors in a round of plagiarism checker testing services, which put them against several much more expensive services.
However, competitors have begun to emerge. Some, such as FairShare offer more features and more free results and others, such as CopyGator, offer great convenience. Despite this, especially for static content, Copyscape has remained the gold standard.
But a new service hopes to provide a new challenge. Plagium, a copy detection system by Septet Systems, provides a very similar service to Copyscape but adds additional free features and uses Yahoo! rather than Google to perform its searches.
The question is how does it stack up and, to measure that, I put the service through a battery of tests, using my well-copied and plagiarized literary works as the measuring stick.
The comparisons between Plagium and Copyscape are obvious, however, the default interface of Plagium is not to provide a URL to be checked, as with Copyscape, but a textbox to paste your text. Though this is less convenient, it actually, in my experience, provides better results as the plagiarism checker is only examining the content, not the surrounding text (navigation, footer, etc.).
However, if you prefer the convenience of just providing the URL, you can click the “Check URL” link and get a more Copyscape-like interface.
Plagium’s results add an interesting new feature called the “Timeline”, which shows roughly when the various reuses went online. This lets you prioritize your actions based upon either the most recent or the least current matches. However, as neat as the feature is, it can get cluttered on works that have a lot of copies and it isn’t exactly clear in the beginning what all of the elements mean, especially the sizes of the bubbles.
However, the most powerful feature of Plagium is its alert system. If you register for a free account, you can have the service track your text and alert you in a weekly email to any new copies it finds. You can also subscribe to an RSS feed of the results.
With this feature, Plagirum becomes something of a FairShare targeted at static content. Where FairShare requires an RSS feed to parse (though there are hacks that can be used to get static content into the system), this can work on any text that can be pasted into the system.
What is amazing about this is that Copyscape only offers the URL search and ten results free. It’s paid accounts, five cents a search, allows users to paste text and receive unlimited results. They also provide a sentry service, which monitors 10 pages once a week for about $5 per month.
However, Plagium currently offers all of these features for free. A representative for the company said that they are providing it for free to “attract paying customers for custom information tracking system development work,” though the site does also accept donations.
But not much of this matters if the plagiarism detection isn’t up to code. So I decided to put the system to a quick test to see how it handles some of my most plagiarized works.
For the purpose of this test I ran five of my works through both Plagium, Copyscape (using the text paste feature) and, as a baseline, I ran a statically improbably phrase from each work through Google.
In each case I looked and attempted to verify that at least most of the results were not false positives. However, it is possible that there are some non-matches or additional duplicates included within the mix.
The results of the tests are below:
The first poem was a 224-wrord poem that was known to be widely plagiarized.
The first test showed that Plagium found approximately 17% more matches than Copyscape. Copyscape, for example, did not find my own site though Plagium listed it first.The page is listed in Google.
Still, the Google results trumped both of the two very handily and provided a large amount of additional results. However, the actual number of results is far lower than the number provided as it appears many of the Google results were duplicates where the same page had multiple URLs.
The second poem is a 279 word poem also known to be heavily plagiarized.
In this test, Plagium outperformed Copyscape by over 100%. However, Plagium does suffer from some duplication issues. For example, my site has two pages listed with the work on it though, once again, it doesn’t appear at all in Copyscape. However, even with this, there are far more unique results in Plagium.
Google once again trumped both of them but the duplication in Google makes that only useful for baseline, not an exact number.
For this test I used a 1550 word short story with very limited reuse.
(*)In this test all three essentially tied. The difference between the 5s by Plagium and Google was the four matches they found on my site. All three found the exact same reuse, which is a legitimate copy of the work on another site.
In this case, they all three performed the same.
For this test, I used a 785 word short story with a modest amount of known reuse.
In this case, Copyscape was the clear winner. Not only did Plagium return fewer results, but the six results were really just 2 as 4 results were from my site and the other 2 from the same forum. Copyscape, on the other hand, delivered 10 matches, at least 4 of which were unique.
Google’s results, on the other hand, contained 20-25 duplicates, making its number closer to the mid 20s.
For this test I used a 202 word prose piece with a moderate amount of known plagiarism.
In this case, Plagium found three unique matches, including my site, that were not in Copyscape. Google did find more matches than both, but once again there was a serious duplication issue. At least nine items in Google’s results were duplicates, meaning that the number is closer to 15-18 results.
Still, this was a clear case where Plagium found results that Copyscape missed.
In all five tests, Google outperformed both Plagium and Copyscape. However, it contained a very high amount of duplicate results and the benefit was likely minimal. In the contest between Plagium and Copyscape, Plagium found more matches three of the times, Copyscape did better in one test and they tied in one.
It appeared to me that Copyscape was not producing the number of matches it once did. The second poem, for example, is the same one I used when comparing Copyscape to itself in 2007. In that testing, it first found no results, then ten results, then 31. With today’s test, it found 9 even though the actual number of copies has remained fairly flat.
Whether this is because Copyscape does not work as well with pasted text (the first tests were done with the URL function) or because changes have limited the results it is producing, it is clear that it is not as effective as it once was for finding all of the results for a work.
However, it is important to note that this is far from a comprehensive comparison of the two service. These are just five very limited cases. Everyone else’s mileage will vary.
In the end Plagium’s results were very solid and it actually performed better than Copyscape in most tests. Whether this is a fluke or a sign of something greater, remains to be seen.
However, since Plagium is completely free, there’s no harm in trying it out and I actively encourage you to do so. You can also experiment with the alerts feature and see if it works well for your content (I haven’t seen any results yet in the few that I set up).
Though I’m not ready to recommend Plagium as the sole plagiarism checker one should use, I don’t think I’ll ever reach that point with any product, but it is a very solid addition pulling in some very competitive matching numbers.
If Plagium isn’t a part of your plagiarism detection toolbox, it should be. The results are solid from what I’ve seen, the features are very powerful and, best of all, it is completely free. You can’t ask for much more out of a plagiarism checker.
Personally, I’ll probably start relying more on Plagium for my static content and continue to use FairShare for items already within an RSS feed. This works well with the intentions and limitations of the two services.