Writing a plagiarism detection tool is one of those things that many people will attempt but few will do well. It seems simple enough to do, espbeecially with the API’s provided by the various search engines, but is actually a very complicated task that requires a great deal of work to do well.
Because of this, there are many plagiarism checkers that pop up and turn out not to be worth anything. They either produce low-quality results, barely offer any useful functionality or barely work at all.
But a quick review of PlagSpotter’s site shows it isn’t the usual “Johnny Come Lately” plagiarism checker. Not only is the site professional and clean, but it states clearly what its goal is right on its home page, “To create an improved and perfected version of another well-known alternative duplicate content checker tool, Copyscape.”
But does PlagSpotter live up to its claimed goal? I decided to put it to the test and see how it stacked up to its competitor, both free and paid versions. What I found was surprising, but raises more questions than it answers.
What is PlagSpotter?
On the simplest level, PlagSpotter is a new copy detection tool being developed by Devellar, a Ukranian company that also has a satellite office in New York City.
The functionality of PlagSpotter will be familiar to anyone who has used Copyscape. You provide the URL that you want to check for duplicates of and PlagSpotter analyzes the content on the page for any matches. There is no charge for one-off searches and no obvious limit to the number of results that PlagSpotter will return. This is in sharp contract to Copyscape, which limits the number of free results to 10.
Also like Copyscape, there’s the ability to pay for automated and recurring detection, with plans ranging from $8 per month for 10 URLs and weekly checks to $50 per month for daily checks of up to 50 URLs. All plans come with a 14-day free trial.
But despite the price tag, the developers say that PlagSpotter is currently a beta product and one under active development.
Indeed, PlagSpotter is clearly very new. The domain was created on June 7th of this year and the press release announcing its opening was posted on the 8th of this month.
Still, with the service already charging money, it raises the question of whether or not PlagSpotter works. I decided to take a look and put the site through a battery of tests to see how well it worked.
The first thing I noticed about PlagSpotter is that its interface, though similar to Copyscape in function is very different aesthetically. Where Copyscape is more utilitarian and Google-like, PlagSpotter is bright and colorful, emphasizing its design and visual appeal.
But once I hit submit on my first URL, I ran into one of the bigger problems I had with PlagSpotter: It’s Speed.
PlagSpotter was far slower than Copyscape or most other plagiarism checkers in this category. On some tests, the progress bar seemed to crawl across the screen or outright freeze. I timed one of my tests, the Declaration of Independence test, as taking over 32 seconds to complete.
While this may not seem like a long time. I was able to, in many cases, start my PlagSpotter test, sign out in Copyscape, sign back in, start the test and get my results all before PlagSpotter finished.
This can be especially frustrating when trying to research larger works or doing several back-to-back.
The results page was also unlike anything I had seen in this particular niche. Rather than providing a simple list of results, PlagSpotter instead provided a copy of the content checked with highlights in it for parts that were matched elsewhere. From there, clicking on the highlighted content would show you a list of sources where it was found or, alternatively, you could click a link in the “Sources Found” section and PlagSotter will highlight all of the passages found on that site.
While this approach was organized and useful, it took some adjustment. It also meant that there wasn’t a version of the matched page with the copied text highlighted. This is, often times, very important when analyzing a page as a potential infringement.
Still, despite these drawbacks, once this system was mastered, it was a convenient way to view, organize and prioritize cases.
However, the UI doesn’t matter much if the matching doesn’t work properly. To test, that, I decided to put it against Copyscape (both paid and free) in a series of 9 tests looking at different types of text. Below are the results.
As is usual when testing out a new plagiarism/copy detection system, I decided to put it through a short battery of tests to see how many copies it’s able to detect of works with a relatively-well-known level of infringement.
This time around, I performed nine tests and, since PlagSpotter made no secret that it’s targeting Copyscape, I compared its results against both Copyscape’s free searches and Copyscape Premium.
With each test I’m looking at the number of matches that each service found (self reported with deeper analysis below) and, since PlagSpotter provides a percentage of the work that is copied, I included that as well. Bear in mind, all of these works have at least one perfect copy on the Web so, ideally, the percentage should be at or near 100% if the figure is accurate.
Here were the results of those tests, by the numbers:
|Test||PlagSpotter||CS Premium||CS Free||PlagSpotter %|
|Test 1: Business FAQ||33||51||10||17%|
|Test 2: Marketing Content 1||19||38||4||20%|
|Test 3: Marketing Content 2||24||23||10||83%|
|Test 4: Poem 1||20||29||10||100%|
|Test 5: Poem 2||21||21||10||94%|
|Test 6: Short Story||32||N/A||10||95%|
|Test 7: Blog Post 1||66||17||8||70%|
|Test 8: Blog Post 2||33||21||10||94%|
|Test 9: Declaration of Ind.||91||160||10||100%|
(Note: I am willing to provide the full URLs used for all tests to either party if they want to replicate the tests.)
As you can see, PlagSpotter and Copyscape Premium each found the highest number of matches 4 times out of the 9 matchups with one tie. In every case, both applications were far better than Copyscape’s free version, which is hard-limited to just ten results.
However, as is normally the case, the numbers only tell a part of the story. Here’s a round up of some of the strange things that happend during the tests.
- Tests 1 & 2: PlagSpotter completely missed on these, reporting barely a fifth as being copied despite perfect copies being available. Gave me very low expectations for the rest of the tests.
- Tests 3, 4, 5 & 8: The strongest results for PlagSpotter with it either winning or tying all of these with few false positives. Behaved like a completely different checker, so much so I retested the first two to see if there was an error, but the results repeated.
- Test 6: Copyscape Premium suffered a serious error and could not match any of the content in the post. Instead, it provided nothing but 40 or so false positives for generic text near the comment form. Copyscape Free had no such trouble, but was limited to 10 results. This test was repeated several times with no change.
- Test 7: Though it seems PlagSpotter blows Copyscape out of the water, in truth nearly all of the 66 matches were false positives, phrases too short to count. Also, despite 66 matches, including several perfect copies, only claimed 70% of the work was copied.
- Test 9: The DOI test is meant to “stress test” a plagiarism checker since thousands of copies are available online. No system can return them all, but how many can they return. Copyscape Premium won this one with few obvious false positives.
In the end, though the test numbers paint the picture of a service that’s already on par with Copyscape Premium, I don’t feel that my experience shows that. When PlagSpotter “missed” it did so far worse than Copyscape Premium (other than the error). Even when it did return more results, it had a bigger problem with false positives and it only detected more than 90% of the identical material in four of the ten tests. Copyscape, on the other hand, did this reliably every time (except for the error) and found at least one match that was complete or near-complete.
In general, Copyscape seemed to do a better job with matching text accurately, weeding out false positives (though it still was far from perfect) and still managed to find more results as often as not.
Overall, despite splitting the tests, I have to give Copyscape the edge when it comes to accuracy and filtering out bad results, making it all-around more useful, at least on the professional level.
Is PlagSpotter Worthwhile?
What all of this boils down to is a simple question: Is PlagSpotter a viable CopyScape competitor?
The answer, unfortunately, is less clear.
If you’re a free user, the answer is very simple, you’ll want to use PlagSpotter. PlagSpotter’s results, though slower, are better than Copyscape’s in that case because Copyscape’s free results are limited.
In short, if you’re not looking to pay, you’ll definitely get better results with PlagSpotter. The only concern is that PlagSpotter’s team informed me that they plan to only keep “limited” free searchers long term, meaning that edge could erode later on.
But, on the other hand, if you’re willing to pay 5 cents per search or for monthly monitoring, things get dicier. Copyscape is a well-entrenched service (around for 9 years) and for PlagSpotter to take its crown it’s not enough to battle it to a draw, it has to win and win handily, something it hasn’t done in these tests.
This is a viewpoint that’s actually shared by Stas Gladkov, a member of the team at PlagSpotter, “Right now we cannot really compete with CopyScape, since they are one of those established players that dominate the PlagSpotter’s targeted market niche.”
So, while Copyscape may not have “won” four of the tests, PlagSpotter’s false positives and inability to detect all of the matching text take away some from its victories give Copyscape a real edge, at least right now. On that note, if you are willing to spend 5 cents a search you’re likely going to be happier with Copyscape at the moment.
The possible exception is if you’re checking originality in a work rather than looking for its copies. PlagSpotter’s UI is very well-suited for that task, even if it’s not a task that it’s being marketed for directly.
Monthly monitoring is much clearer. In addition to Copyscapes edge in the results, its service is also significantly cheaper. Though 10 pages monitored weekly will cost you $7.95 at PlagSpotter (a sale price that’s usually $9.95) the same service at Copyscape is only $4.95. Likewise, 25 pages monitored weekly at Copyscape is $8.70, the same numbers will cost you $10.95 at PlagStopper (on sale from $14.95).
This is somewhat odd because Gladkov said that one of the key benefits of PlagStopper was its price. “There are a couple of established players in the niche (referring, in part, to Copyscape) that offer pretty good quality; however, they are pricy and do not offer some features that the current web community requires.”
But despite Gladkov’s views, Copyscape, for recurring checking, is cheaper and the service in general has more features than PlagSpotter.
Still, I would be remiss of me to not encourage you to try PlagSpotter. As the tests show, it does work better with some types of content and that could include yours. But that doesn’t make it better than Copyscape all around, at least not yet.
Even if PlagSpotter didn’t blow me away, I’m still optimistic about it and its future. It is a beta product and it has come a long way for something that has been in existence such a short period of time.
However, PlagSpotter will have its work cut out for it. Gideon Greenspan, the creator of Copyscape, wrote me to say that he has been working a great deal on Copyscape and has already performed a “major upgrade” on the backend that now allows the service to return results 5 times faster than before with extensive improvements on accuracy and speed.
Greenspan also said that they are now working on the UI, “with a full-site redesign and a number of new features coming out over the next year.”
On that note, if PlagSpotter hopes to catch up and overtake Copyscape, I’d like to offer these suggestions for PlagSpotter’s staff to help them improve their product moving forward:
- Address UI Issues: There needs to be a way to view matched content on a page and, ideally, share that view with a third party. This is my favorite feature in Copyscape and one that I can’t easily do without.
- Fix Speed Issues: Speed isn’t crucial when it comes to checking for plagiarism, PlagScan is much slower and still generally well-loved
- Missing Features: Finally, Copyscape has a slew of features such as batch results, an API, private index and case tracking that give it a compelling edge. While not all of these features are implemented well, case tracking among them, or regularly used, such as the API, but they are often very useful to the highest-end users.
Copyscape, on the other hand, I think could learn a bit from PlagSpotter. PlagStopper’s UI is more attractive and its default interface for reviewing the plagiarism, though confusing at first, provides better information. If it were paired with all of Copyscape’s features, it would be outright superior.
But the most important reason for Copyscape to pay attention is the impact of PlagSpotter’s unlimited free service, which basically makes Copyscape a second-place tool for those who don’t want to pay. While that’s not the target audience for either service, it could prove to be an important one down the road.
In the end, the most important thing is that PlagStopper is introducing competition into a field that had grown somewhat stagnant. The last service to launch that targeted this niche was Plagium, which opened in 2009 and saw its last major development in 2011.
This is something else that Gladkov agrees with, “One can say that CopyScape is a monopoly in the duplicate content checking area. PlagSpotter can bring in some competition here, which will make everyone better off.”
On that, there is no doubt. This shot across the bow can only make both Copyscape and PlagSpotter better and that’s a win-win for all webmasters who are interested in protecting their content.
It’s going to be a very exciting year for plagiarism detection.