Copyscape Improved Again

Last week, I received another email from Gideon Greenspan of Copyscape telling me that he and his team had made further tweaks to the system that should, in theory, drastically improve the services ability to detect cases of mass plagiarism.

The service had already shown promising improvements after the first review was so unimpressive. However, even though the product was much better than before the changes, there were still caveats that prevented me from offering an unrestrained endorsement.

So the question became whether or not the new improvements would be enough to eliminate those caveats. I decided to find out and grabbed my test works for yet a third round of Copyscape testing.

Twice As Good

In the first round of improvements, it was hard to do much worse. On the poem Teardrops, Copyscape had failed to find any matches. However, on the second pass, Copyscape’s performance improved drastically to catch 10 of 25 results.

While a drastic improvement, it still resulted in over half of all copies going undetected.

However, with the latest changes, Copyscape more than doubled the number of copies it detected. On the first poem, it caught 31 copies out of a potential 36. On the second poem, Copyscape caught 41 results out of what Google listed as 97 potential results (Note: That number seems incredibly high to even me, I am investigating it currently).

See the results from all three tests blow:

First Run:

ccbefore1.png

Second Run:

ccafter1.png

Third Run:

teardrops 3

Some of the bump in the third test is due to increased plagiarism on the poem, due largely to one or two people whom I am I am working on resolution with, but even discounting the increased number of targets, the accuracy has gone way up. Where, previously, Copyscape caught less than half of all potential results, it now caught over 86%.

Though the results on the second poem are much worse, about 42% of the potential results, that seems to be due somewhat to some strange behavior on Google’s end with these results.

All in all, it is clear that Copyscape is now catching far more copies than it was previously and that the service is continuing to improve at a very rapid pace.

Recommendations

In a few short weeks, Copyscape has gone from a near-total dud when detecting mass plagiarism to what can now be called an impressive tool. Though further testing is needed to find out exactly how effective it is with other types of works, the improvements are more than obvious. (Note: Copyscape detected all copies of two short stories submitted, however, both only had a few copies on the Web)

Copyscape is a tool and it should be used as such. It should not be relied upon solely but, in its current state, can definitely be used in conjunction with other detection methods such as Google Alerts and Digital Fingerprints. It now seems to have a place in a well-rounded plagiarism detection strategy.

Longer works, clearly, will get more out of Copyscape than shorter ones. That is because the longer the work is, the more likely that the phrase searched for was not included in the plagiarized copy. Also, with longer works, it can be harder to find a good statistically improbable phrase to search for, thus making it easier to let Copyscape do the legwork.

However, if you do use Copyscape, it would probably be best to go ahead and spend the few dollars for Copyscape Premium. The ten result limit on the free search is crippling on works that either have large-scale plagiarism issues or generate a great deal of legitimate reuse.

I set up my account before the first test with five dollars and still have well over half of my searches left. It is probably the best deal available in plagiarism detection.

Conclusions

Any time you use an automated service such as Copyscape you make a trade off. You give up some of the control and effectiveness in exchange for speeding up the process. Copyscape has now hit a point where the trade off is likely worthwhile.

Your mileage may vary and these tests are limited. However, other test results have shown a similar level of satisfaction with Copyscape.

In the not-too-distant future, I plan on doing a thorough analysis, similar to Dr. Weber-Wulff’s to determine how effective the various Web plagiarism tools are at finding all of the duplicate content out there and ranking them accordingly.

Stay tuned for more information.

18 Responses to Copyscape Improved Again

  1. Will says:

    Nice to hear they are improving the short-comings. It must be hard for authors of service like this to keep up. I notice spam comments getting better all the time. I got one yesterday that said:"healthy recipe are great …Many times, you’ll become surprised by the mountainous accumulation of healthy food for kids. info realizable…."Another had this in the comment section: "Great discussion on the energy saving bulbs better always cheaper soon to be good"Both only had one link, (one to porn, the other to a page with lots of obviously scraped content interdispersed with hundreds of links). They were also on topic, sort of, to the post they commented on, and to my blog in general, so they slipped by Akismet. I only saw them because I get notification and a copy of all comments to my email. On a busy site, they could have stayed posted.I see a market for a service that allow you to search all comments on your blog for suspicious ones that may have slipped through.I look forward to your analysis of how effective the Web plagiarism tools are.-Will

  2. JB says:

    Will, Thanks for the heads up, if you can email me the links I'd appreciate it. I delete any spam comments that get through as soon as I seem them but most of them aren't actually comments, they're trackbacks. reCAPTCHA stops pretty much all of the comment spam. I tend to stamp those out as I get them but, in a moment of bitter irony, I'll wager the emails for them are getting filtered out by my spam filters. I hadn't even pondered that until now…

  3. Will says:

    Nice to hear they are improving the short-comings. It must be hard for authors of service like this to keep up. I notice spam comments getting better all the time. I got one yesterday that said:

    “healthy recipe are great …
    Many times, you’ll become surprised by the mountainous accumulation of healthy food for kids. info realizable….”

    Another had this in the comment section: “Great discussion on the energy saving bulbs better always cheaper soon to be good”

    Both only had one link, (one to porn, the other to a page with lots of obviously scraped content interdispersed with hundreds of links). They were also on topic, sort of, to the post they commented on, and to my blog in general, so they slipped by Akismet. I only saw them because I get notification and a copy of all comments to my email. On a busy site, they could have stayed posted.

    I see a market for a service that allow you to search all comments on your blog for suspicious ones that may have slipped through.

    I look forward to your analysis of how effective the Web plagiarism tools are.

    -Will

  4. JB says:

    Will,

    Thanks for the heads up, if you can email me the links I’d appreciate it. I delete any spam comments that get through as soon as I seem them but most of them aren’t actually comments, they’re trackbacks. reCAPTCHA stops pretty much all of the comment spam.

    I tend to stamp those out as I get them but, in a moment of bitter irony, I’ll wager the emails for them are getting filtered out by my spam filters.

    I hadn’t even pondered that until now…

  5. Will says:

    Jonathan – The spam comments I referred to were on MY site, and were on topic to the posts they were put on. What I am thinking is that the spam bots are getting good at seeing keywords or something and making their spam close to the topic of the post. I guess the idea is they are more likely to slip through.

  6. JB says:

    Will: The tao of Homer comes to mind here – D’oh!

    I completely misread your comment and I’m sorry. That makes much more sense. Proof I either need reading glasses, likely, or have been hit on the head one too many times, also likely.

    I think you’re right about the nature of spam bots though. What I’ve seen some do is quote a portion of the original article and say “I Agree!” or something like that. Others says something like “The thing about is that they can’t be and thus we can only hope to .”

    Sometimes it is garbage, but sometimes it actually works.

    I think you’re right about that, in short.

  7. JB says:

    Will: The tao of Homer comes to mind here – D’oh!

    I completely misread your comment and I’m sorry. That makes much more sense. Proof I either need reading glasses, likely, or have been hit on the head one too many times, also likely.

    I think you’re right about the nature of spam bots though. What I’ve seen some do is quote a portion of the original article and say “I Agree!” or something like that. Others says something like “The thing about <keyword from=”from” article=”article”> is that they can’t be <keyword2> and thus we can only hope to <keyword3>.”

    Sometimes it is garbage, but sometimes it actually works.

    I think you’re right about that, in short.</keyword3></keyword2></keyword>

  8. [...] for for many Webmasters, the paid service starts at pennies a search and is well worth the money. Drastic improvements to the service have made it a force to be reckoned [...]

  9. [...] a great deal of progress to be seen. New plugins are constantly being developed to stop scrapers, search techniques are constantly being improved and new tracking methods are being [...]

  10. [...] the technique can detect verbatim plagiarism, it works best in situations where the amount of work copied word for word is low or moderate. This [...]

  11. [...] hacks in and of themselves, just significantly more user-friendly and elegant in nature. Sadly, despite improvements, this simplicity comes at the expense of [...]

  12. [...] is in stark contrast to detection and licensing companies, such as Copyscape that have listened to my issues and made changes to fix them. In fact, many of these companies go [...]

  13. [...] to be seen how effective it is. Match detection is not easy, even with a big search partner, as Copyscape showed. The system will not be of much use if its match detection is not the best in its [...]

  14. maria luisa says:

    hi. i find your article enlightening. but i am a newbie in writing and i tried copyscape. i'm no computer expert thus i can't seem to go beyond posting my article. can you help me how to check for plagiarism? thanks.

  15. [...] Copyscape Improved Again – PlagiarismToday [...]

  16. [...] Copyscape Improved Again | PlagiarismToday [...]

  17. muscle blogs says:

    Thanks for the review of copyscape. It is really important to find people stealing your work because often google penalizes you for duplicate content even thought it was the other who stoled it.

  18. [...] for for many Webmasters, the paid service starts at pennies a search and is well worth the money. Drastic improvements to the service have made it a force to be reckoned [...]

Leave a Reply

STAY CONNECTED