<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism Todayplagiarism checker | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/plagiarism-checker/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 06:51:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Problem with Detecting Translated Plagiarism</title>
		<link>http://www.plagiarismtoday.com/2011/02/24/the-problem-with-detecting-translated-plagiarism/</link>
		<comments>http://www.plagiarismtoday.com/2011/02/24/the-problem-with-detecting-translated-plagiarism/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 18:03:07 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism checker]]></category>
		<category><![CDATA[translated plagiarism]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=9048</guid>
		<description><![CDATA[Translated plagiarism is the final frontier of automated plagiarism checkers, here is why it has been such a tough problem. ]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  class="alignleft size-full wp-image-9060" title="Google Translate Logo" src="http://www.plagiarismtoday.com/wp-content/uploads/2011/02/google-translate.jpg" alt="Google Translate Logo" width="230" height="52" />Dr. Deborah Weber-Wulff is possibly best known for her recurring and stringent testing of plagiarism detection systems. Every year or two, she publishes a thorough report of her findings, <a href="http://www.plagiarismtoday.com/2011/01/13/plagaware-takes-top-honors-in-plagiarism-checker-showdown/">as she did in January</a>, that serves as probably the best barometer of such systems.</p>
<p>But while the &#8220;best&#8221; systems change out with almost every test, there is an overall trend of at least slow progress in the systems, especially in detecting smaller bits of plagiarism.</p>
<p>However, there is one area where there has been almost no progress, translated plagiarism.<a href="http://plagiat.htw-berlin.de/software-en/2010-2/"> Even in the 2011 test results</a>, no system was able to effectively deal with the issue of translated plagiarism and, for the foreseeable future, that is likely to remain the case.</p>
<p>The reason is that the way automated systems detect plagiarism isn&#8217;t very well-geared toward detecting translated plagiarism and it&#8217;s unlikely any automated system, at least not for some time, will be very effective at it.</p>
<p>But this doesn&#8217;t mean that translated plagiarism will become rampant or even a safe haven for plagiarists. Just that other detection methods will need to be used.<span id="more-9048"></span></p>
<h4>How &#8220;Plagiarism&#8221; Detection Works</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  class="alignright size-full wp-image-9061" title="Turnitin Logo" src="http://www.plagiarismtoday.com/wp-content/uploads/2011/02/turnitin-logo.jpg" alt="Turnitin Logo" width="210" height="69" /></p>
<p>To be completely clear, plagiarism detection systems don&#8217;t actually detect plagiarism at all, they detect copies. The exact methods vary but the results are the same, the systems look for matching phrases and, when they are detected, begins to look deeper and see if there is more extensive matching between the documents.</p>
<p>The system is incredibly efficient and it enables automated plagiarism checkers to look through a mind-bogglingly large amount of content for any similarities. However, the weakness of the system is that it relies on exact matches. If you can change out enough words, you can easily fool plagiarism checkers. This is precisely <a href="http://www.plagiarismtoday.com/2005/12/05/synonymized-plagiarism-a-new-threat/">how synonymized or &#8220;spinning&#8221; plagiarism</a> works.</p>
<p>To counter this, plagiarism checkers have routinely narrowed the holes in their net. According to iParadigms, the makers of <a href="http://turnitin.com">Turnitin</a> and <a href="http://ithenticate.com">iThenticate</a>, one would have to change one out of every three words in an essay or article to be reasonably assured it wouldn&#8217;t trip their detection.   In fact, according to a lecture I attended at the <a href="http://www.plagiarismtoday.com/2008/06/30/recap-3rd-international-plagiarism-conference/">3rd International Plagiarism Conference</a>, the existing systems did a reasonable job, even in 2008.</p>
<p>However, translated plagiarism takes the problem exposed by spinning plagiarism and magnifies it many fold, pushing it well past what our current systems can handle.</p>
<h4>Why Translated Plagiarism is So Difficult</h4>
<p>There are at least three problems in trying to detect translated plagiarism and they each combine to make a near-impossible problem for automated systems to crack.</p>
<ol>
<li><strong>There is No One Right Way to Translate a Word:</strong> &#8220;Gato&#8221; may be Spanish for &#8220;Cat&#8221; but it also might mean &#8220;Feline&#8221;, &#8220;Tabby&#8221; and a slew of other synonyms. There is a lot of nuance to these words, but an automated system will see them as completely different.</li>
<li><strong>Different Languages Have Different Grammar Structures:</strong> Though most languages in a family will have a similar structure, even subtle nuances between languages will make it so that a word-for-word translation is not possible as a sentence in one language will have to be completely rewritten to be correct in another.</li>
<li><strong>No Effective Automatic Translation System:</strong> <a href="http://www.conveythis.com/translation.php">Bad Translator</a> illustrations this problem pretty well. It translates text from one language to another and then back again to English, often with hilarious results. Automated translation systems work well enough to be understood, but not well enough to detect exact matches.</li>
</ol>
<p>What this all adds up to is that, with translation, there is simply too much in the way of nuance and interpretation to be able to create an exact match out of translated plagiarism, at least with any reliability. Add to that the fact most translated plagiarism also has some element of rewriting built into it, there isn&#8217;t much that can be done.</p>
<p>A translated version of a work, once translated back, just isn&#8217;t going to match up with the original in any significant way, at least not in a way that can be detected through automatic systems.</p>
<p>Plagiarism detection systems could counter this by casting a wider net, accepting looser matches and trying to accept synonyms of words as exact matches. However, not only does this increase the amount of processing power needed to do detection, but it also opens up the door to false positives.</p>
<p>Automated plagiarism detection systems, in order to have any usefulness, need to strike a balance between catching a reasonable amount of matches and not returning too large a number of false positives. Too few actual matches, plagiarized works slip through routinely, too many false positives and the the results are useless and impossible to go through.</p>
<p>Casting a net wide enough to find translated plagiarism would, almost without a doubt, would generate a lot of false positives, especially when dealing with several papers on the same or similar topic.</p>
<p>This means that, for the most part, automated detection of translated plagiarism will be all but impossible. Though checkers can and will get lucky from time to time, they won&#8217;t be able to do it reliably. However, the news is not all bad, in fact, the problem isn&#8217;t as great as many likely believe it to be.</p>
<h4>The Good News</h4>
<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  class="alignleft size-medium wp-image-9063" title="bad-translator-logo" src="http://www.plagiarismtoday.com/wp-content/uploads/2011/02/bad-translator-logo-300x54.jpg" alt="" width="300" height="54" /></p>
<p>The good news in all of this is the same good news that comes out of spinning plagiarism: Doing a quick job with it produces shoddy results, doing a good job with it requires a great deal of time.</p>
<p>There is a reason high-quality translation services are both difficult to find and expensive to procure. Good translation is difficult, time consuming and very specialized. The odds of a potential plagiarist being able to do a high-quality translation of a work in less time than it would have taken to simply produce an original creation with the other work cited correctly are slim.</p>
<p>For the most part, people who commit acts of translated plagiarism will be caught not by an automated system, but by teachers and professors who notice a change in the student&#8217;s writing or see clear errors in the work.</p>
<p>For this kind of plagiarism, humans are always going to be the best weapon as we are able to spot the imperfections such plagiarism inevitably create.</p>
<p>In short, the mere fact automated systems aren&#8217;t able to easily detect translated plagiarism doesn&#8217;t mean that those who go that route will easily get away with the deed. In fact, they may even be more easily caught as human intuition is often easier to interpret than a plagiarism report.</p>
<h4>Bottom Line</h4>
<p>It is highly unlikely with the current approach of automated plagiarism detection that we will be able to spot translated plagiarism reliably. The current system just isn&#8217;t geared for it and casting a net wide enough to find it would also ensnare far too many false positives to be useful.</p>
<p>This just goes to show that plagiarism detection systems, no matter how good and useful they are, can never be magic &#8220;catch all plagiarism&#8221; machines. We can not simply turn over our judgment on what is and is not plagiarism to any automated system because they will always have weaknesses and problems.</p>
<p>Plagiarism detection always requires a human element and the automated systems are merely tools that help the humans do a better job of figuring out what is copied.</p>
<p>Once educators, content creators and everyone else learns that, they can start using these tools as they were intended. That, in turn, will make the plagiarism atmosphere both in academia and online a lot clearer.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2011/02/24/the-problem-with-detecting-translated-plagiarism/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>PlagAware Takes Top Honors in Plagiarism Checker Showdown</title>
		<link>http://www.plagiarismtoday.com/2011/01/13/plagaware-takes-top-honors-in-plagiarism-checker-showdown/</link>
		<comments>http://www.plagiarismtoday.com/2011/01/13/plagaware-takes-top-honors-in-plagiarism-checker-showdown/#comments</comments>
		<pubDate>Thu, 13 Jan 2011 18:29:18 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[ephrous]]></category>
		<category><![CDATA[plagaware]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism checker]]></category>
		<category><![CDATA[turnitin]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=8694</guid>
		<description><![CDATA[PlagAware, along with four other checkers, were deemed "partially useful" in the latest rounds of testing by Dr. Debora Weber-Wulff.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2011/01/htw-logo1.jpg" alt="" title="htw-logo" width="247" height="71" class="alignleft size-full wp-image-8696" /><a href="http://www.f4.htw-berlin.de/~weberwu/">Dr. Debora Weber-Wulff</a>, who is both a professor at the HTW Berlin in Germany and the author of the great <a href="http://copy-shake-paste.blogspot.com/">Copy, Shake, Paste blog in English</a> has <a href="http://plagiat.htw-berlin.de/software-en/2010-2/">announced the results of her 2009/2010 plagiarism checker tests</a> and <a href="http://www.plagaware.com/">PlagAware</a>, a little known-service from Germany, has taken top honors.</p>
<p>Behind it was <a href="http://turnitin.com/">Turnitin</a>, the most popular academic plagiarism checker and behind them was <a href="http://www.ephorus.com/home">Ephorus</a>, a Netherlands-based plagiarism checking application.</p>
<p>All in all, the tests put some 48 different plagiarism checkers through 42 different tests, which included English, German and Japanese language tests involving whole plagiarism, edited text, translations and a few originals. Of those 48 systems, 26 were able to complete the tests and earn a final grade.</p>
<p>The final grade was determined by both how well the checker performed on the tests as well as how professional it was and how usable it was in an academic environment, specifically its workflow and how quickly it returned results. The checkers were then grouped into three classes &#8220;Partially Useful&#8221;, &#8220;Barely Useful&#8221; and &#8220;Useless&#8221;. </p>
<p>Since none of the plagiarism checkers were able to score above a 70% on Dr. Weber-Wulff&#8217;s tests, none of the services were given a &#8220;Useful&#8221; score and instead received the equivalent of a C+ on their grade.</p>
<p>However, the test may have also exposed several other problems with automated plagiarism checkers, issues that could directly impact content creators seeking to find a service to track their work.</p>
<h4>Problems and Interesting Results</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2011/01/GeKO_WeberWulff-2011_02-300x199.png" alt="" title="GeKO_WeberWulff-2011_02" width="300" height="199" class="alignright size-medium wp-image-8695" />The biggest gap in all the plagiarism checkers was the inability to locate translated plagiarism. While this is widely expected as the technology to make such detections simply is not there, it&#8217;s a hole in coverage that has remained since Weber-Wulff performed her first round of tests in 2004. </p>
<p>A more unusual and less-expected gap was the lack of coverage in Google Books. In every checker, 100% plagiarism in Google books failed to return more than 25% plagiarism in the checkers. It appears that the Google API, upon which many of these services rely upon, does not cover Google Books and that makes searching for plagiarism from books very difficult.</p>
<p>Also, umlauts and other non-English characters continued to present challenges to many plagiarism checkers though it was much less the case this time than in previous tests, indicating a better effort to internationalize plagiarism checkers.</p>
<p>Finally, with the spike of new plagiarism checking services, according to Dr. Weber-Wulff, has risen a number of services that appear to be less-than-honest about their intentions, including Viper Plagiarism Checker, which Weber-Wulff suspects is using its plagiarism checking service to harvest essays for its related essay writing service.</p>
<h4>Results</h4>
<p>As mentioned above, PlagAware took top honors in the tests, followed by Turnitin and Ephorus. Rounding out the &#8220;Partially Useful&#8221; category was <a href="http://www.plagscan.com/">PlagScan</a> and <a href="http://www.urkund.com/int/en/">Urkund</a>.</p>
<p>The &#8220;Barely Useful&#8221; category was made up of <a href="http://www.plagiarismfinder.de/">Plagiarism Finder</a>, <a href="http://www.docoloc.de/">Docoloc</a>, <a href="http://www.copyscape.com/">CopyScape</a>, <a href="http://www.safeassign.com/">Blackboard/Safe Assign</a>, <a href="http://plagiarisma.net/">Plagiarisma</a>, <a href="http://www.compilatio.net/en/">Compalitio</a>, <a href="http://strikeplagiarism.com/">StrikePlagiarism</a> and <a href="http://www.dustball.com/cs/plagiarism.checker/">The Plagiarism Checker</a> (<a href="http://www.plagiarismtoday.com/2008/12/16/review-the-plagiarism-checker/">Better Known as Dustball</a>).</p>
<p>The &#8220;Useless&#8221; category was (not linking to these as some are dubious) iPlagiarismCheck, Plagiarism Detector, Un.Cov.Er, Genuine Text, Catch it First, Plagium, Viper, Plagiarism Search, Grammarly, Percent Dupe, Plagiarism Checker and Article Checker.</p>
<p>Dr. Weber-Wulff made it clear that her results are geared toward a very specific usage scenario, namely use in a German university. She also felt that even the most useful checkers were not ideally suited for checking every single student paper submitted, but rather, were useful when a professor had a suspicion of plagiarism and wanted to use an automated system to help track it down.</p>
<p>Still, the results are interesting and they show that smaller companies can, in at least some situations, be better than larger ones for plagiarism detection. The two biggest players, Turnitin and Blackboard came in second and eighth respectively. It also shows that there is a lot of fluidity in the market as Copyscape, <a href="http://www.plagiarismtoday.com/2008/11/04/copyscape-tops-plagiarism-checker-testing/">the winner of the last round of checks</a>, was in the &#8220;Barely Useful&#8221; category and was seventh overall.</p>
<p>Primarily though, it shows what I&#8217;ve known all along and that is the bulk of plagiarism checkers are garbage. I&#8217;ve said as such about some of the &#8220;Useless&#8221; services including <a href="http://www.plagiarismtoday.com/2010/05/18/review-un-co-ver-plagiarism-checker/">Un.Cov.Er</a>, <a href="http://www.plagiarismtoday.com/2010/04/29/review-viper-anti-plagiarism-scanner/">Viper</a> and the &#8220;Barely Useful&#8221; service The Plagiarism Checker.</p>
<p>But as interesting as the results are, their application to readers of this site is actually fairly limited. and the reasons are pretty simple.</p>
<h4>Limitations and Caveats</h4>
<p>Dr. Weber-Wulff made it clear that her research was targeted at one case use, namely that of a German university. However, she did strive to make the results more applicable to other uses, namely by including other languages and various plagiarism types.</p>
<p>Still, readers of this site who are working to track their writing may not want to read too deeply into the results and use them more as a general guide.There are several reasons for this.</p>
<ol>
<li><strong>Usage Scenario:</strong> There are two types of plagiarism detection, the first is testing a work of unknown origin for authenticity and the second is finding copies of a known authentic work. This test looks at the first scenario where most readers of this site need the second and both require different skills. This may explain why Plagium performed so poorly in these tests, but reasonably well in mine. </li>
<li><strong>Language:</strong> The primary testing language was still in German, even though the test included both English and Japanese checks, the results will still likely skew to those with strong German-language checking.</li>
<li><strong>Usability Requirements:</strong> Many checking their work for plagiarism won&#8217;t have the same usability concerns that a professor running through 200 student papers will. So usability issues that sank some of the checkers may not affect you.</li>
</ol>
<p>That being said, Dr. Weber-Wulff&#8217;s tests are definitely a good guide and a good starting point. That&#8217;s why I, over the next month or so, will be going through and looking at many of the plagiarism checkers that took top honors in her tests and see how they do in tracking content for the purposes of a content creator.</p>
<p>At the very least, the results are a solid indication as to how well the algorithms work in these checkers and how large their databases are, that alone is reason enough to give them a closer look.</p>
<h4>Bottom Line</h4>
<p>Quickly, I want to thank Dr. Weber-Wulff and her student assistant, Katrin Köhler, for performing these checks. The two of them spent over 9 months performing these checks and are still not 100% done. I also share their hopes that the German government, or another government, might take up the cause of funding these checks in the future so her ability to continue would not be tied to one university with the funding difficulties that come with that.</p>
<p>Though these tests aren&#8217;t perfect in that they are not all things to all people, they are important and useful as they provide an apples to apples comparison between the various checkers and that are tested.</p>
<p>And that, in turn, is how I treat the results, not as a gospel on which plagiarism checker to use, but an unbiased test that compares the various services side-by-side in one usage scenario.</p>
<p>When treated that way, the tests become very useful and an important tool in determining which plagiarism checkers to look at.</p>
<p><em><strong>Photo Credit:</strong> c. 2011: Axel Völcker, DerWedding.de</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2011/01/13/plagaware-takes-top-honors-in-plagiarism-checker-showdown/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Plagiarism Checker Test Results to be Announced Friday</title>
		<link>http://www.plagiarismtoday.com/2011/01/04/plagiarism-checker-test-results-to-be-announced-friday/</link>
		<comments>http://www.plagiarismtoday.com/2011/01/04/plagiarism-checker-test-results-to-be-announced-friday/#comments</comments>
		<pubDate>Tue, 04 Jan 2011 21:54:57 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[berlin]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[dr. weber-wulff]]></category>
		<category><![CDATA[english]]></category>
		<category><![CDATA[germany]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism checker]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=8660</guid>
		<description><![CDATA[Dr. Weber-Wulff is preparing to announce the latest results of her testing of various plagiarism checkers and you're invited to attend the live announcement.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2011/01/htw-logo.jpg" alt="HTW Logo" title="HTW Logo" width="283" height="88" class="alignleft size-full wp-image-8661" /><a href="http://www.f4.fhtw-berlin.de/~weberwu/">Dr. Debora Weber-Wulff</a>, who is a professor at the <a href="http://www.fhtw-berlin.de/">University of Applied Sciences in Berlin</a>, is preparing to announce the &#8220;winners&#8221; of her latest round of plagiarism checker testing. </p>
<p>In her last round, which was in late 2008, Dr. Weber-Wulff surprised the world by <a href="http://www.plagiarismtoday.com/2008/11/04/copyscape-tops-plagiarism-checker-testing/">announcing that Copyscape had outperformed the competition</a>. This time, however, she has tested more services, 26 out of 47 possible, and has put them all through a rigorous test to see how well they do detecting a known amount of plagiarism.</p>
<p>Her tests are unique in that they are the among the few truly scientific tests that compare plagiarism checker applications in an apples-to-apples environment. Though her conclusions are usually controversial, they are something of a gold standard that plagiarism checkers, legitimate ones at least, usually try to meet.</p>
<p>That being said, this time Dr. Weber-Wulff is doing more than simply posting the results on her site. Instead, she&#8217;s announcing the results over a livestreamed event. Specifically, there will be two announcements on this Friday January 7. <a href="https://webconf.vc.dfn.de/plagiat10/">The first will be in German at 900 UTC</a>, which will be at 4 AM ET, <a href="https://webconf.vc.dfn.de/plagiarism10/">the second will be in English and at 1500 UTC or 10 AM ET</a>. </p>
<p>Everyone is invited to attend and the only requirement will be a computer capable of viewing Flash video and listening to audio. </p>
<p>For those who can&#8217;t attend, there will be a site for the results afterward and, of course, Plagiarism Today will have a thorough write-up on the outcome as well. </p>
<p>For more information, <a href='http://www.plagiarismtoday.com/wp-content/uploads/2011/01/Pressrelease-Nr-1-20110103.pdf'>see this press release</a>. Hope to see some of you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2011/01/04/plagiarism-checker-test-results-to-be-announced-friday/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Review: UN.CO.VER Plagiarism Checker</title>
		<link>http://www.plagiarismtoday.com/2010/05/18/review-un-co-ver-plagiarism-checker/</link>
		<comments>http://www.plagiarismtoday.com/2010/05/18/review-un-co-ver-plagiarism-checker/#comments</comments>
		<pubDate>Tue, 18 May 2010 20:18:50 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[custom articles]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism checker]]></category>
		<category><![CDATA[texbroker]]></category>
		<category><![CDATA[ucv]]></category>
		<category><![CDATA[uncover]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=6659</guid>
		<description><![CDATA[Textbroker has a new plagiarism checker that it has made freely available. But how does it work and does it get the job done?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/05/uncover-logo-300x71.jpg" alt="" title="uncover-logo" width="300" height="71" class="alignleft size-medium wp-image-6669"></p>
<p><a href="http://www.textbroker.com">Textbroker</a> is a marketplace for custom-written articles. Though such sites are often derided for, allegedly, turning a blind eye to plagiarism, Textbroker seems to be doing something about it. The company has made available <a href="http://www.textbroker.com/uncover/">a freeware application named UN.CO.VER</a> , which stands for UNique COntent VERifier that it claims can both search for plagiarism within a work and seek infringing copies of the content.</p>
<p>The application is a Java app and can run on Windows, Mac or Linux and is only a 2MB download. It claims to be able to check content that is pasted in, a single URL or even an entire domain in one swoop.</p>
<p>At the encouragement of a <a href="http://angelaswanlund.com/blog">a good friend of mine named Angela Swanlund</a>, who was on Textbroker&#8217;s mailing list and was offered to use UN.CO.VER, I decided to put the program through its paces and see if it lived up to its description.</p>
<p>The answer surprised even me.<span id="more-6659"></span></p>
<h4>How it Works</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/05/uncover-sample-223x300.jpg" alt="" title="uncover-sample" width="223" height="300" class="alignright size-medium wp-image-6660"></p>
<p>UN.CO.VER (UCV from here on) is a fairly straightforward plagiarism checker. </p>
<p>Out of the box you are presented with three options:</p>
<ol>
<li><strong>Check Text:</strong> Lets you copy and paste in text for checking.</li>
<li><strong>Check Domain:</strong> Poorly named, this feature lets you check a single URL. Can set up beginning and ending strings to avoid checking unwanted text such as comments.</li>
<li><strong>Check Websites:</strong> This feature will attempt to crawl and perform a plagiarism check of an entire Web site, including up to 2 levels deep though more is possible through manual selection.</li>
</ol>
<p>In all three options, the process is pretty much the same. You set up the scan that you want and click &#8220;Check Now&#8221;. UCV will then go through the text and find any suspected duplicate content and report it in the space below. The process seems to take just a few seconds per 1000 words and, even in repeated testing, didn&#8217;t crash or create any errors.</p>
<p>The question, however, is how well the UCV works. To find the answer, I decided to put it through its paces, testing each option individually.</p>
<h4>Where UN.CO.VER Gets Its Results</h4>
<p>Before I began the tests, I noticed that the UCV site and manual were unusually tight-lipped about where they were getting their results from, something that made me instantly suspicious. So before doing any reviewing, I decided to find out.</p>
<p>Using a tool called <a href="http://www.fiddler2.com/fiddler2/">Fiddler2 Web Debugger</a>, I routed all of UCV&#8217;s traffic through Fiddler2&#8242;s proxy and listened to the Internet traffic.<br />
<img src="http://www.plagiarismtoday.com/wp-content/uploads/2010/05/uncover-yahoo-proof-500x208.jpg" alt="" title="uncover-yahoo-proof" width="500" height="208" class="alignnone size-large wp-image-6665"></p>
<p>What I found was that UCV was using <a href="http://developer.yahoo.com/search/boss/">Yahoo Search API</a> to find its results. Certainly not a bad way to do it and no reason to be secretive about it, but it is also a system available to everyone. </p>
<p>I also noticed that UCV was looking at the URLs provided by Yahoo and doing some kind of additional analysis as most of the URLs listed in Fiddler2 as being visited by UCV were not appearing in the results. This indicates that UCV may be more than a &#8220;dumb&#8221; plagiarism checker repeating results from Yahoo. </p>
<p>With that out of the way, I decided to do a few tests of UCV to see how well it performed.</p>
<h4>Check Text Feature</h4>
<p>To start testing the &#8220;Check Text&#8221; feature, I began with an article that had a known amount of reuse, namely last week&#8217;s column about <a href="http://www.plagiarismtoday.com/2010/05/10/how-schools-are-hurting-the-fight-against-plagiarism/">schools are hurting the fight against plagiarism</a>.</p>
<p>Since UCV isn&#8217;t aware I run Plagiarism Today, it should report PT as a complete duplicate and, assuming no other reuse exists, nothing else..</p>
<p>Indeed, that is exactly what happened. UCV reported the original URL on Plagiarism Today and also listed a other URLs that had an extremely small amount of matching content, all of which were false positives triggered by matches for strings such as &#8220;to cite sources&#8221;.</p>
<p>Next, I tried an old poem of mine that I knew to be widely copied and reused. UCV churned on the work and found seven copies of it. However, two of those copies were on my site, leaving only five potential plagiarisms. However, a quick search on Google easily found a dozen copies of the poem on sites other than mine, some legitimate, some not. </p>
<p>One thing that was impressive was that one of the results UCV turned up was a modified version of the poem that was only about 65% the same. Still, the results overall were incomplete.</p>
<p>Finally, in an attempt to see how it would handle &#8220;clean&#8221; text, I ran the draft of this article through the service. Surprisingly, UCV found some 24 potential matches though all were less than 5% of the unfinished article and were, once again, for very short strings such as &#8220;I decided to put&#8221;.</p>
<p>Toying with the sensitivity settings alleviated this problem some, but I wasn&#8217;t able to find a good balance between few false positives and good match detection.</p>
<h4>Check Domain Feature</h4>
<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/05/uncover-sample2-225x300.jpg" alt="" title="uncover-sample2" width="225" height="300" class="alignleft size-medium wp-image-6666"></p>
<p>Since I already had some idea as to UCV&#8217;s matching ability, I decided to simply test this feature on a short story of mine that I knew had a moderate amount of reuse.</p>
<p>I set up the system, being careful to include the beginning and ending phrases to avoid any extra content being searched, and let it do its thing. </p>
<p>Unfortunately, UCV came back with no useful matches, just a handful of false positives. However, a quick search on Google found at least three copies of the work in addition to the one on my site. Though it was clear UCV had parsed the text correctly, it simply did not find the critical matches.</p>
<p>Needless to say, this was very disappointing but it seems to be an issue with UCV&#8217;s matching, not the URL check feature as it did work correctly.</p>
<h4>Check Websites Feature</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/05/uncover-sample3-223x300.jpg" alt="" title="uncover-sample3" width="223" height="300" class="alignright size-medium wp-image-6668"></p>
<p>Finally, I decided to cut UCV loose on the whole of my site to see what it could do. I told it to check my entire old literature domain and do so 2 levels deep, meaning it should follow links on the home page and links on those secondary pages.</p>
<p>Almost immediately, I noticed that there seemed to be a lot of pages missing that should have been included as they were less than 2 levels deep. However, I decided to simply try with what it had and watched as UCV spun.</p>
<p>Unfortunately, that was the first problem. Though UCV seemed fairly quick when doing each item, it felt sluggish when going through so many. Though the time per item was about the same, for some reason the delay just felt that much longer when it was back-to-back with dozens of other works.</p>
<p>I ended up having to stop the search about halfway through as I couldn&#8217;t do other tests while this search was ongoing and I had some other checks to make. Still, I had enough time to let it get a few dozen results. </p>
<p>What I found with those results was a combination of a lot of false positives and a lot of inaccurate match totals. Since there was no way to tell UCV to ignore comments, it grabbed a lot of additional, unwanted content on every page and, combined with its sensitivity, threw back almost exclusively false positives and, the few matches it did find that were interesting it listed with lower percentages than accurate.</p>
<p>Without the ability to filter the content indexed, there isn&#8217;t much this feature can do. Unless your site exclusively has your own work (meaning no comments and almost no navigation, footer, etc.) then this feature is going to be wildly inaccurate for the most part.</p>
<h4>Bottom Line</h4>
<p>To be completely honest. UCV performed better than I expected, but only because my expectations were so breathtakingly low. Everything about this application screams &#8220;unprofessional&#8221; from the broken EULA agreement (that is also in the demo video) to the poor word choices, lack of documentation and amateurish layout. The app looks and feels like it should trip on its own shoelaces and die.</p>
<p>However, it actually does a reasonable job with certain kinds of plagiarism checks. Though my testing was limited, if a work was plagiarized, it did seem to detect it, with the exception of the short story. This means that UCV may be at least somewhat useful for its stated task, checking content for plagiarism, and not trying to find every copy of an original work. Simply put, UCV was just too inaccurate to do that task.</p>
<p>That being said, my greatest concern with UCV is its affiliation with a custom-writing site. Rightly or wrongly, these sites have become known as hotbeds for plagiarism (due in large part to their relatively low payout to writers) and emphasis on rush jobs. This can be seen as an attempt to help writers avoid accusations of plagiarism while engaging in copying that would, if detected, be considered as such.</p>
<p>Whether this is the goal or not is hard to say. But it does seem odd that a custom writing site would offer a plagiarism-checking tool to its writers to check their own work for accidental content misuse.</p>
<p>While mistakes do happen, they are far less common than intentional misuse and that alone makes me feel strange about this application.</p>
<p>Still, looking solely at the merits of the application, it does seem to work reasonably well for what it is designed to do. However, with so many other great tools, including CopyScape and Plagium, already available, I don&#8217;t see what the benefit of a standalone application is. Sure, it&#8217;s URL checking feature is pretty cool, but it isn&#8217;t much easier than just copying and pasting what you need and its full site check is just too slow and too inaccurate to be of much use.</p>
<p>In short, give it a try if you want and consider it a  useful addition to your toolbox, but don&#8217;t make it your primary checker for any purpose. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2010/05/18/review-un-co-ver-plagiarism-checker/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Review: Viper Anti-Plagiarism Scanner</title>
		<link>http://www.plagiarismtoday.com/2010/04/29/review-viper-anti-plagiarism-scanner/</link>
		<comments>http://www.plagiarismtoday.com/2010/04/29/review-viper-anti-plagiarism-scanner/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 19:29:23 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism checker]]></category>
		<category><![CDATA[plagiarism checking]]></category>
		<category><![CDATA[Review]]></category>
		<category><![CDATA[viper]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=6503</guid>
		<description><![CDATA[The Viper Anti-Plagiarism Scanner promises to be a free way for students to check their papers. But how well does the application work?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/04/viper-logo.jpg" alt="" title="viper-logo" width="217" height="85" class="alignleft size-full wp-image-6513"></p>
<p>It is the time of year when term papers are coming due and students are worrying more than ever about being accused of plagiarism. It&#8217;s the busy season for plagiarism detection services both for teachers and for students and, as is typical, I get asked my opinions on them.</p>
<p>One email recently asked me my thoughts on the <a href="http://www.scanmyessay.com/index.php">Viper Anti-Plagiarism Scanner</a>, a free application and service provided by Scan My Essay. I&#8217;ve been familiar with the service for a while but never bothered to test it. However, since it was requested I decided to put the application through a few of my paces.</p>
<p>Unfortunately, the results were less than spectacular and, truth be told the application fell almost completely flat. The only question is whether the problems were glitches caused by a temporary problems or something more chronic with the program.<span id="more-6503"></span></p>
<h4>How it Works</h4>
<p>The idea behind VIper is that you download the small application, less than 1MB, and register for an account with the service. Once you&#8217;ve done that, you simply select the file or files you want to check for plagiarism and send Viper on its way. Viper, after some processing, will come back with the results.</p>
<p>Though the process is simple, and familiar to anyone who has used a plagiarism checker in the past, it does have a few interesting features. One of the biggest being its ability to match against a local database, the Web or both. This means that, if you have a pool of content you want to test against, you can do that with or without also checking the broader Web.</p>
<p>Also, the results page also uses a very effective layout, showing the uploaded work side-by-side to the suspected matches. This is very convenient for analyzing the match and makes developing an opinion about whether an element is plagiarized or not very simple.</p>
<p>Beyond those two features, both of which can actually be found in other applications or services, the rest of the application is fairly straightforward. While that is not a bad thing in and of itself, the problem is that it doesn&#8217;t seem to do the job it set out to. </p>
<h4>My Tests</h4>
<p>Setting up the application proved  difficult. It took several tries to get the application to install correctly and almost 15 minutes to figure out how to create an account (Hint: You have to click the link in the program itself.). Though it took a while, about 40 minutes, I was eventually able to get the application up and start testing.</p>
<p>As with any test of a plagiarism checker, I start out by having it search for a work where there is a known amount of plagiarism. In this case, I started with an <a href="http://www.whoishostingthis.com/blog/">article that I had submitted to Who Is Hosting This?</a> but still had the old RTF for. The work has not have been widely plagiarized, but does appear on the site so Viper should have registered the RTF as a 100% plagiarism.</p>
<p>Unfortunately, after letting Viper chew on the file for some time, it came up with nothing but a few short quotes, each a few words long, that were coincidence or properly cited. Though the work is a perfect &#8220;plagiarism&#8221;, Viper found only minor and incidental matching, all if it less than a few percent.</p>
<p>I decided to wait a few days before trying again and did so this morning, starting with an article I had written for the <a href="http://www.ejc.net/magazine/article/guardian_feeds_its_readers/">European Journalism Centre</a>. This article, much like the previous one, only exists on the one site. However, it should still come back as 100% plagiarized. </p>
<p>Unfortunately, I never got results from this article. After uploading it and letting it spin for over ten minutes, nothing happened. The analysis of the article simply froze.</p>
<p><img src="http://files.plagiarismtoday.com/wp-content/uploads/2010/04/viper-sample-e1272567510244-500x377.jpg" alt="" title="viper-sample" width="500" height="377" class="alignnone size-large wp-image-6504"></p>
<p>I tried it again repeatedly with the same article but received the same result.</p>
<p>For my last test, I tried <a href="http://www.ravensrants.com/in-the-dark/">an old poem of mine</a> that I knew was widely copied, both with and without permission. I uploaded this one to the service but the first time it completed it found nothing. I tried again and the process froze up, even crashing the application. I tried it one more time and, finally, got an affirmative result.</p>
<p><img src="http://files.plagiarismtoday.com/wp-content/uploads/2010/04/viper-sample3-e1272567238355-500x379.jpg" alt="" title="viper-sample3" width="500" height="379" class="alignnone size-large wp-image-6506"></p>
<p>Viper, after nearly two hours of setup and failed searching, finally had detected a single case of &#8220;plagiarism&#8221; spotting the URL where the poem can be found on the Web.</p>
<p>Needless to say though, this small victory has me much less than impressed.</p>
<h4>More Problems</h4>
<p>In the two hours I had allotted to test Viper, I had only been able to search for three documents and only one of those searches, after many retries, was successful. In the same amount of time, I could have processed many dozens of documents using virtually any other means. In fact, my <a href="http://www.plagiarismtoday.com/2008/02/14/search-engine-showdown-testing-plagiarism-detection/">search engine showdown</a> post was compiled in about the same time I spent testing Viper and it required some 45 searches.</p>
<p>But in addition to Viper being slow and unreliable, it also has me a bit creeped out. The application, on Windows 7 at least, requires special permission to modify content on the hard drive. Though I don&#8217;t believe it is a virus or has any malicious intent, <a href="http://download.cnet.com/Viper-the-Anti-plagiarism-Scanner/3000-2051_4-10795356.html?tag=mncol">reviewers at CNet</a> have warned that it messed with their Word settings and suffered many crashes. Not the kind of program I want having broad access to my computer.</p>
<p>In short, I would not recommend installing this program on your computer at this time. Looking at the decidedly mixed reviews on CNet, it seems as if my experience was fairly typical though others have had even worse problems. </p>
<h4>Bottom Line</h4>
<p>From what I can tell, Viper is an application that feeds into a Web service. Why it needs a downloaded app when the real work takes place online is unclear, but it seems likely that the problems are with the server end, not the app itself (other than the installation issues).</p>
<p>If they can correct their server problems, Viper has a lot of potential for doing decent plagiarism checking.</p>
<p>However, for students seriously worried about their papers, would urge you to go ahead and spend the small amount of money and use <a href="http://www.writecheck.com/static/home.html">WriteCheck</a>. Not only does it use the same database as most colleges and high schools, the Turnitin one, it doesn&#8217;t index your paper and it has access to private libraries and collections Viper can&#8217;t see. Furthermore, the matching technology, while imperfect, seems to be better.</p>
<p>As an alternative, you can use <a href="http://copyscape.com">Copyscape</a>, <a href="http://plagium.com">Plagium</a> or simple Google queries to check for accident plagiarism. All will work faster and better than Viper in its current form.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2010/04/29/review-viper-anti-plagiarism-scanner/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>5 Reasons Google is My Primary Plagiarism Checker</title>
		<link>http://www.plagiarismtoday.com/2010/02/09/5-reasons-google-is-my-primary-plagiarism-checker/</link>
		<comments>http://www.plagiarismtoday.com/2010/02/09/5-reasons-google-is-my-primary-plagiarism-checker/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 18:19:47 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Attributor]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[copyscape]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[icopyright]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism checker]]></category>
		<category><![CDATA[plagium]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=5530</guid>
		<description><![CDATA[With all of the powerful tools out there for detecting plagiarism, is it possible Google is still the best?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  alt="Google&#039;s Logo" src="http://files.plagiarismtoday.com/wp-content/uploads/2010/02/google-logo4.jpg" title="Google Logo" class="alignleft" width="304" height="119"></p>
<p>Whether you are a writer looking for plagiarized copies of your work or a teacher/professor checking academic papers for plagiarism, Google is your friend.</p>
<p>Google provides, by far, the easiest way to perform quick plagiarism checks, whether to find if a work is plagiarized or has been the victim of plagiarism, it does so for free and it does it in a very robust way.</p>
<p>Though there are a lot of great tools out there with many great uses, Google remains my first stop for plagiarism checks in most cases as it is simply faster, cheaper and more accurate than most other tools.</p>
<p>Though you shouldn&#8217;t use it exclusively and definitely should not shy away from using additional tools, you need Google in your arsenal and you need to learn how to use it well. Otherwise, you may find yourself spending more time and money than needed while not getting the results you desire.<span id="more-5530"></span></p>
<h4>Why Google</h4>
<p>When deciding where to start with your plagiarism check, consider the five following reasons to start with Google:</p>
<ol>
<li><strong>Human Analysis is Best:</strong> It is pretty trivial for a human to find a statistically improbable phrase that is likely to be reused. Some plagiarism checkers don&#8217;t ignore quoted and cited content and all search for content that is likely repeated without plagiarism. This means a few seconds spent on the front end finding a good phrase can save hours on the backend filtering through false positives. Furthermore, over-reliance on more automated systems can result in users taking the results as gospel and not performing adequate human evaluation. This can be a tremendous mistake.</li>
<li><strong>Immediate, Accessible and Free:</strong> Even a complicated Google search is returned within a few seconds. Some take days to process matches while even the faster ones usually take a few minutes, this hinders their usefulness in checking hunches. Also, Google is free to use and is available anywhere you have an Internet connection, even via your phone. The service that fits in your schedule and budget is the one you will use and if you don&#8217;t use a plagiarism checker, it can do no good at all.</li>
<li><strong>Accuracy:</strong> In my experience, Google produces far fewer false positives than even more advanced plagiarism checkers. It also has a very large database with billions of pages, including PDFs, Word files and other non-HTML formatted content. It also updates in very close to real time with Google News and blog search, making it great for finding instances of plagiarism that take place quick after publication.</li>
<li><strong>It&#8217;s What You Care About:</strong> If your work is plagiarized and the plagiarism isn&#8217;t in Google, does it exist? It&#8217;s a valid question and, if you&#8217;re a content creator worried about SEO, the answer is probably no. Other checkers that don&#8217;t work off Google&#8217;s database may cause you to spend time and resources on leads that don&#8217;t matter. Other databases are usually slower to update. Also, Google tends to do a good job of prioritizing matches for you, starting with those that are more important. Finally, Google, in my experience, is the most popular means for students to plagiarize their work, making it a logical tool to backtrack any suspected plagiarism.</li>
<li><strong>It&#8217;s Dead Simple:</strong> Everyone knows how to do a Google search. Not everyone knows how to format a paper for submission to another service. It&#8217;s a method anyone can use with almost no training at all, including those easily intimidated by technology.</li>
</ol>
<p>In short, Google is easy to use, very fast and provides very accurate, broad results for the total price of free. Though it isn&#8217;t the perfect plagiarism checker by any stretch. When others ask me to quickly check a work for them, it is where I usually start. If something trips my sensors, I will often times use another checker, such as Plagium or CopyScape to drill down deeper. </p>
<p>In short, there is no intended slight in this of other plagiarism checkers, in fact, there are many legitimate needs that they are needed to fill.</p>
<h4>Google&#8217;s Limitations</h4>
<p>As great as Google is, there are still limitations to what it can do and those limitations are often filled very well through other services. Consider the following:</p>
<ol>
<li><strong>Organization and Resolution Assistance:</strong> Google simply provides results, it is up to you to organize them and take action on them. Services like <a href="http://attributor.com">Attributor</a> and <a href="http://icopyright.com">iCopyright Conductor</a>, which are aimed at larger content creators, and <a href="http://turnitin.com/static/index.html">Turnitin</a> and <a href="http://www.safeassign.com/">SafeAssign</a>, which are aimed at schools, provide that organization. This makes managing large case loads much more bearable.</li>
<li><strong>Additional Sources:</strong> Plagiarism checkers that specialize in academic environments, including Turnitin, include additional databases that are not available to Google including private article databases and research paper.</li>
<li><strong>Full-Work Matching:</strong> Though Google is great for quick checks and finding potential matching pages, determining what content is matching and which isn&#8217;t is a headache by hand. More robust checkers will highlight the duplicate content and make it easy to see at-a-glance what has been copied. Plagiarism checkers such as <a href="http://www.copyscape.com">Copyscape</a>, which is based on Google, and <a href="http://plagium.com">Plagium</a> are natural additions to Google in this area. Also, collusion detection such as <a href="http://www.plagiarism.phys.virginia.edu/Wsoftware.html">WCopyFind</a> can check two suspect documents, such as one Google suspects, and highlight matching portions.</li>
</ol>
<p>In short, these tools have a time and a place. I still recommend them highly and use them widely depending on the project and situation. However, they do some of their best work after Google or another search engine has alerted the searcher to the possibility of plagiarism and a deeper look is needed to determine how significant the potential infraction is.</p>
<h4>Bottom Line</h4>
<p>When someone asks me to check and see if a work is plagiarized, especially if they are wanting me to see if the work appears anywhere else on the Web, I usually turn to Google first. Though other checkers are great, Google simply does the best job of letting me know how much copying the work has seen, who the most important infringers/likely sources are and if further research is needed.</p>
<p>Uunless Google alerts me that there is a likely problem, I know that other services will most likely be a waste of time that will possibly have me swimming through false positives or simply waiting for results. All in all, it is time lost that could be better spent elsewhere. </p>
<p>For most searches, Google is my primary tool of choice. Though it isn&#8217;t usually the last word on whether or not a work has been plagiarized, it tells me what I need to know and helps me better determine what I need to do next. It is my first choice for plagiarism checker, the default tool I reach for, but that doesn&#8217;t make it the only one I use.</p>
<p>Regardless, learning how to use Google for plagiarism detection and learning how to use it well should be the first priority for anyone wanting to find duplicate content, whether of their own work or to detect plagiarism in other&#8217;s. Without it, you won&#8217;t be as effective at plagiarism detection nor as able to perform the task.</p>
<p>Simply put, relying on a plagiarism checker to make decisions for you is a poor move, especially with the danger of false positives. Human judgement is the best and Google lets you exercise it some before bringing in the bigger guns.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2010/02/09/5-reasons-google-is-my-primary-plagiarism-checker/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>1st International Plagiarism Detection Competition</title>
		<link>http://www.plagiarismtoday.com/2009/05/13/1st-international-plagiarism-detection-competition/</link>
		<comments>http://www.plagiarismtoday.com/2009/05/13/1st-international-plagiarism-detection-competition/#comments</comments>
		<pubDate>Wed, 13 May 2009 17:27:45 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Contest]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism checker]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=3468</guid>
		<description><![CDATA[If you develop or help work on a plagiarism detection system, you may want to register for 1st International Plagiarism Detection Competition for a chance to prove how good your system is and claim the cash prize. ]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://files.plagiarismtoday.com/wp-content/uploads/2009/05/pan-logo.jpg" alt="pan-logo" title="pan-logo" width="250" height="80" class="alignleft size-full wp-image-3469" /></p>
<p>Though I recognize that most of the readers of this site as Webmasters eager to protect their content, I also know that more than a few developers of plagiarism detection tools read this blog. For them, I wanted to do a quick post about about the upcoming Spanish Society for Natural Language Processing 2009 conference, which is hosting a PAN workshop on plagiarism analysis, authorship identification and &#8220;social software misuse&#8221;.</p>
<p>As part of this PAN workshop, <a href="http://research.yahoo.com/">Yahoo! research</a> is hosting what it is calling the <a href="http://www.webis.de/pan-09/competition.php">1st International Competition on Plagiarism Detection</a>, which it hopes to make an annual event.</p>
<p>The competition pits plagiarism detection systems against one another to test their accuracy and completeness.</p>
<p>Specifically, there are two tasks:</p>
<ol>
<li><strong>External Plagiarism Analysis:</strong> This task provides contestants with suspect documents and source documents and requires the system to find the plagiarized passages. </li>
<li><strong>Intrinsic Plagiarism Analysis:</strong> This task requires contestants to detect plagiarized passages WITHOUT comparison to outside documents, for example, by detecting shifts in writing style.</li>
</ol>
<p>The competition is providing the documents to be tested, estimated to be at 20,000 source and 20,000 suspect documents of various sizes with various amounts and kinds of plagiarism. The documents are primarily in English and the plagiarism has been &#8220;perpetrated&#8221; by a software application that randomizes the the amount plagiarized, the obfuscation and even, in some cases, translation.</p>
<p>The competition is open to commercial plagiarism checkers but requires that submissions be provided in a set XML format to make it easier for them to process the output (due to the large volume of plagiarism). This may mean that some services have to &#8220;hack&#8221; their output to fit the standards of the competition. </p>
<p>The winning product receives 500 Euros and submissions are being accepted until June 7th, 2009. Please see the link above for the specific rules.</p>
<p>This is not the first time a broard-array of plagiarism detection suites have been put to the test. In November of last year, <a href="http://www.f4.fhtw-berlin.de/~weberwu/">Dr. Debora Weber-Wulff</a>, a professor at the <a href="http://www.fhtw-berlin.de/">University of Applied Sciences in Berlin</a>, <a href="http://plagiat.htw-berlin.de/software/2008/">announced the results of her second round of testing</a> and <a href="http://www.plagiarismtoday.com/2008/11/04/copyscape-tops-plagiarism-checker-testing/">gave the top prize to Copyscape</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2009/05/13/1st-international-plagiarism-detection-competition/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>10 Dollar Articles Plagiarism Checker</title>
		<link>http://www.plagiarismtoday.com/2009/02/10/10-dollar-articles-plagiarism-checker/</link>
		<comments>http://www.plagiarismtoday.com/2009/02/10/10-dollar-articles-plagiarism-checker/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 18:12:38 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[msn]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism checker]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=2770</guid>
		<description><![CDATA[A new plagiarism checker promises to be both a competitor and a compiment to Copyscape. But can it live up to its own marketing?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-logo.png" alt="10da-logo" title="10da-logo" width="272" height="45" class="alignleft size-full wp-image-2786" />It has become all the rage in recent months for programmers to build or revamp plagiarism checkers using Google and other search engines. Most of these plagiarism checkers, <a href="http://www.plagiarismtoday.com/2008/12/16/review-the-plagiarism-checker/">such as the &#8220;Dustball&#8221; checker</a>, fail to produce adequate results. </p>
<p>The problem is that phrase selection is not simple task. It can be difficult for human beings to determine what phrases or sentences to search for, let alone a simple algorithm. As a result, such simplistic plagiarism checkers often times either miss a large number of results by choosing phrases that don&#8217;t work well with the search engines or produce a slew of false positives by selecting too common or too short of terms.</p>
<p>Thus, when I <a href="http://www.10dollararticles.com/blog/free-beta-plagiarism-checker/28/">read about a new SE-based plagiarism checker</a>, this one <a href="http://www.10dollararticles.com/plagiarism-checker.htm">by SEO content writing service 10 Dollar Articles</a> (10DA), I was skeptical at best.</p>
<p>Though a cursory search proved many of my original suspicions, it also showed that the plagiarism checker isn&#8217;t quite as useless as many of its brethren. Though it has its flaws and certainly isn&#8217;t as useful as its marketing might say, it does have some interesting features and potential compelling uses.<span id="more-2770"></span></p>
<h4>How it Works</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-sample.png" alt="10da-sample" title="10da-sample" width="269" height="101" class="alignright size-full wp-image-2790" />The 10DA checker works like many similar services. Users copy an article or piece of content that they want to check for plagiarism, they then choose up to three services to use and hit the submit button.</p>
<p>The service then selects a series of five or six snippets from the work and runs them through each of the search indexes checked against. When it&#8217;s done, it links to each of the results pages and the user can go through the results to see if there are any suspicious matches.</p>
<p>The site keeps track of which results pages you have already visited, turning those numbers to black, and also lets you recheck the article with a different set of search engines.</p>
<h4>The Good</h4>
<p>The real benefit of this system is that it is extremely simple to use and free. Since the product is in beta, anyone can paste text in and run it through the system.</p>
<p>The service is also stands out somewhat in that it allows users to run the search through different search engines, unlike others that focus solely on Google. With the 10DA checker, you can easily search MSN and Yahoo! as well as Bloglines and more. Though many of the choices seem superfluous, especially the multiple Google services normally covered underneath the main search (such as searching either Wikipedia or Google Knol) the addition of extra choices is an interesting one. </p>
<p>That being said, it isn&#8217;t the first checker to offer this service, <a href="http://www.articlechecker.com/">others have been doing so for some time</a>. </p>
<p>Though it is unclear how much benefit one gets from running the same article through three different engines, it is easy to see how those eager to be extremely thorough may be tempted by that feature.</p>
<h4>The Bad</h4>
<p>Where the 10DA checker struggles the most is in the value that it adds, or lack thereof. Where Copyscape compiles the results from the various Google queries it makes and displays them in a simple results page, 10DA requires users to click through to each individual results page and do the actual legwork themselves. At this time, the 10DA checker does not even provide indication of the number of matches in the specific results pages.</p>
<p>Due to this, the results that one gets from the 10DA checker could be easily replicated by going to the individual search engines and doing the searches for yourself. The 10DA checker does not even automatically select to view similar matches, meaning that the initial display only includes one or two copies of the work in question.</p>
<p>With no match highlighting, organization or other input from the system, essentially it is the same as performing 5-18 individual searches at once. Since only one search is usually all that is necessary to prove that a work is plagiarized, one has to wonder how useful this really is.</p>
<h4>My Tests</h4>
<p>As with most plagiarism checkers I review, I ran the site through a short series of tests to see how the results compared with stock Google searches. Since the system still primarily uses Google, this would be a true &#8220;apples to apples&#8221; comparison.</p>
<p>The first test involved a prose work of mine that I know has been plagiarized many times before. I ran it through the 10DA checker and the best result of the six phrases checked in Google was 26 matches. However, after tinkering with the search term, namely by shortening it and removing punctuation, I was able to improve it to 31 results.</p>
<table cellspacing=15>
<tr>
<td><strong>10DA Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-results.png" alt="10da-results" title="10da-results" width="284" height="99" class="alignnone size-full wp-image-2771" /></td>
</tr>
<tr>
<td><strong>Tweaked Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/my-results.png" alt="my-results" title="my-results" width="284" height="99" class="alignnone size-full wp-image-2772" /></td>
</tr>
</table>
<p>The reason for this is that the phrases the 10DA checker chooses seem, to me, to be extremely long. Where I can usually find a good statistically improbable phrase between 7-9 words long, all of the phrases chosen by the 10DA checker were over a dozen words, some even grow as long as 19. </p>
<p>Though the longer strings do reduce false positives, choosing a good unique phrase is more important in that regard. This is something that the 10DA checker struggled with as some of the results had only one match, indicating that the phrase selected was of poor quality.</p>
<p>I also quickly tested the checker with a poem that I knew to be heavily plagiarized. However, many of the matches, due to an issue with apostrophes, came back as false negatives. Of those that did, the highest had 25 results but, once again, by tweaking the search term, I was able to increase that number 28. However, using my own phrase, I was able to find several hundred results.</p>
<table cellspacing=15>
<tr>
<td><strong>10DA Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-results12.png" alt="10da-results12" title="10da-results12" width="263" height="74" class="alignnone size-full wp-image-2775" /></td>
</tr>
<tr>
<td><strong>Tweaked Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-myresults2.png" alt="10da-myresults2" title="10da-myresults2" width="263" height="74" class="alignnone size-full wp-image-2776" /></td>
</tr>
<tr>
<td><strong>My Phrase Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-myresults3.png" alt="10da-myresults3" title="10da-myresults3" width="263" height="74" class="alignnone size-full wp-image-2777" /></td>
</tr>
</table>
<p><em>(Note: The high number of results from my phrase are likely due in large part to matches on the same domain. However, in a cursory check of the first few pages of results, I did see at least some positive matches that were not in the first two.)</em></p>
<p>The end result is that most people will find it pretty trivial to get better results than the 10DA checker. If they can look at the phrase selected, remove punctuation and pull out a good section of unique content, they can increase the effectiveness of the search. </p>
<p>However, why one would do that is a bit of a mystery. If you&#8217;re going through all of these motions and need the added matches that come from a better phrase, you&#8217;re probably going to find it faster and easier just to pull the phrase yourself directly from the content and then perform your own search.</p>
<h4>Conclusions</h4>
<p>Even though the site&#8217;s marketing material says that it is both a competitor and a compliment to Copyscape, Copyscape is by far a more useful service. Though 10DA seems to be about on par with the number of matches Copyscape catches, the usability of Copyscape is much higher and well worth the five cents per search in most cases.</p>
<p>Still, if you&#8217;re looking to do a quick plagiarism check of an article before you post it on your site, something my wife has to do as her company&#8217;s blog editor, it might be a useful service. If you don&#8217;t feel like setting up a Copyscape account or don&#8217;t mind the extra step of visiting the results, then it could be useful.</p>
<p>However, I can not recommend this service for checking for duplicate content of your site&#8217;s material. You can get more accurate matches by hand and the amount of energy that is saved by using the 10DA checker is pretty minimal. Even the free version of Copyscape provides good matching and a much higher usability.</p>
<p>But even that seems somewhat defeatist. With <a href="http://fairshare.cc">Fairshare</a> bringing <a href="http://www.plagiarismtoday.com/2009/02/03/attributor-announces-fairshare-service/">professional-grade matching technology and automatic updates to bloggers</a>, there is no reason that bloggers or other RSS providers should be punching in their articles by hand to check for plagiarism.</p>
<p>Static content may have different needs, but with <a href="http://www.plagiarismtoday.com/2008/01/24/video-how-to-use-google-alerts/">Google Alerts</a> and <a href="http://www.plagiarismtoday.com/2008/07/01/bitscan-release-copy-alerts/">CopyAlerts</a>, there is little reason to manually check those results either.</p>
<p>In short, the age of copying and pasting textual content to see where it has appeared on the Web is fast ending. That is good news though as the easier to use and more automated the systems become, the more likely bloggers and other writers are to use them.</p>
<p>Hopefully, similar systems for images, audio and video are also fast coming.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2009/02/10/10-dollar-articles-plagiarism-checker/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 11:43:54 -->
