<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism Todaymsn | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/msn/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 06:51:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>10 Dollar Articles Plagiarism Checker</title>
		<link>http://www.plagiarismtoday.com/2009/02/10/10-dollar-articles-plagiarism-checker/</link>
		<comments>http://www.plagiarismtoday.com/2009/02/10/10-dollar-articles-plagiarism-checker/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 18:12:38 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[msn]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism checker]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=2770</guid>
		<description><![CDATA[A new plagiarism checker promises to be both a competitor and a compiment to Copyscape. But can it live up to its own marketing?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-logo.png" alt="10da-logo" title="10da-logo" width="272" height="45" class="alignleft size-full wp-image-2786" />It has become all the rage in recent months for programmers to build or revamp plagiarism checkers using Google and other search engines. Most of these plagiarism checkers, <a href="http://www.plagiarismtoday.com/2008/12/16/review-the-plagiarism-checker/">such as the &#8220;Dustball&#8221; checker</a>, fail to produce adequate results. </p>
<p>The problem is that phrase selection is not simple task. It can be difficult for human beings to determine what phrases or sentences to search for, let alone a simple algorithm. As a result, such simplistic plagiarism checkers often times either miss a large number of results by choosing phrases that don&#8217;t work well with the search engines or produce a slew of false positives by selecting too common or too short of terms.</p>
<p>Thus, when I <a href="http://www.10dollararticles.com/blog/free-beta-plagiarism-checker/28/">read about a new SE-based plagiarism checker</a>, this one <a href="http://www.10dollararticles.com/plagiarism-checker.htm">by SEO content writing service 10 Dollar Articles</a> (10DA), I was skeptical at best.</p>
<p>Though a cursory search proved many of my original suspicions, it also showed that the plagiarism checker isn&#8217;t quite as useless as many of its brethren. Though it has its flaws and certainly isn&#8217;t as useful as its marketing might say, it does have some interesting features and potential compelling uses.<span id="more-2770"></span></p>
<h4>How it Works</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-sample.png" alt="10da-sample" title="10da-sample" width="269" height="101" class="alignright size-full wp-image-2790" />The 10DA checker works like many similar services. Users copy an article or piece of content that they want to check for plagiarism, they then choose up to three services to use and hit the submit button.</p>
<p>The service then selects a series of five or six snippets from the work and runs them through each of the search indexes checked against. When it&#8217;s done, it links to each of the results pages and the user can go through the results to see if there are any suspicious matches.</p>
<p>The site keeps track of which results pages you have already visited, turning those numbers to black, and also lets you recheck the article with a different set of search engines.</p>
<h4>The Good</h4>
<p>The real benefit of this system is that it is extremely simple to use and free. Since the product is in beta, anyone can paste text in and run it through the system.</p>
<p>The service is also stands out somewhat in that it allows users to run the search through different search engines, unlike others that focus solely on Google. With the 10DA checker, you can easily search MSN and Yahoo! as well as Bloglines and more. Though many of the choices seem superfluous, especially the multiple Google services normally covered underneath the main search (such as searching either Wikipedia or Google Knol) the addition of extra choices is an interesting one. </p>
<p>That being said, it isn&#8217;t the first checker to offer this service, <a href="http://www.articlechecker.com/">others have been doing so for some time</a>. </p>
<p>Though it is unclear how much benefit one gets from running the same article through three different engines, it is easy to see how those eager to be extremely thorough may be tempted by that feature.</p>
<h4>The Bad</h4>
<p>Where the 10DA checker struggles the most is in the value that it adds, or lack thereof. Where Copyscape compiles the results from the various Google queries it makes and displays them in a simple results page, 10DA requires users to click through to each individual results page and do the actual legwork themselves. At this time, the 10DA checker does not even provide indication of the number of matches in the specific results pages.</p>
<p>Due to this, the results that one gets from the 10DA checker could be easily replicated by going to the individual search engines and doing the searches for yourself. The 10DA checker does not even automatically select to view similar matches, meaning that the initial display only includes one or two copies of the work in question.</p>
<p>With no match highlighting, organization or other input from the system, essentially it is the same as performing 5-18 individual searches at once. Since only one search is usually all that is necessary to prove that a work is plagiarized, one has to wonder how useful this really is.</p>
<h4>My Tests</h4>
<p>As with most plagiarism checkers I review, I ran the site through a short series of tests to see how the results compared with stock Google searches. Since the system still primarily uses Google, this would be a true &#8220;apples to apples&#8221; comparison.</p>
<p>The first test involved a prose work of mine that I know has been plagiarized many times before. I ran it through the 10DA checker and the best result of the six phrases checked in Google was 26 matches. However, after tinkering with the search term, namely by shortening it and removing punctuation, I was able to improve it to 31 results.</p>
<table cellspacing=15>
<tr>
<td><strong>10DA Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-results.png" alt="10da-results" title="10da-results" width="284" height="99" class="alignnone size-full wp-image-2771" /></td>
</tr>
<tr>
<td><strong>Tweaked Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/my-results.png" alt="my-results" title="my-results" width="284" height="99" class="alignnone size-full wp-image-2772" /></td>
</tr>
</table>
<p>The reason for this is that the phrases the 10DA checker chooses seem, to me, to be extremely long. Where I can usually find a good statistically improbable phrase between 7-9 words long, all of the phrases chosen by the 10DA checker were over a dozen words, some even grow as long as 19. </p>
<p>Though the longer strings do reduce false positives, choosing a good unique phrase is more important in that regard. This is something that the 10DA checker struggled with as some of the results had only one match, indicating that the phrase selected was of poor quality.</p>
<p>I also quickly tested the checker with a poem that I knew to be heavily plagiarized. However, many of the matches, due to an issue with apostrophes, came back as false negatives. Of those that did, the highest had 25 results but, once again, by tweaking the search term, I was able to increase that number 28. However, using my own phrase, I was able to find several hundred results.</p>
<table cellspacing=15>
<tr>
<td><strong>10DA Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-results12.png" alt="10da-results12" title="10da-results12" width="263" height="74" class="alignnone size-full wp-image-2775" /></td>
</tr>
<tr>
<td><strong>Tweaked Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-myresults2.png" alt="10da-myresults2" title="10da-myresults2" width="263" height="74" class="alignnone size-full wp-image-2776" /></td>
</tr>
<tr>
<td><strong>My Phrase Results:</strong></td>
<td><img src="http://www.plagiarismtoday.com/wp-content/uploads/2009/02/10da-myresults3.png" alt="10da-myresults3" title="10da-myresults3" width="263" height="74" class="alignnone size-full wp-image-2777" /></td>
</tr>
</table>
<p><em>(Note: The high number of results from my phrase are likely due in large part to matches on the same domain. However, in a cursory check of the first few pages of results, I did see at least some positive matches that were not in the first two.)</em></p>
<p>The end result is that most people will find it pretty trivial to get better results than the 10DA checker. If they can look at the phrase selected, remove punctuation and pull out a good section of unique content, they can increase the effectiveness of the search. </p>
<p>However, why one would do that is a bit of a mystery. If you&#8217;re going through all of these motions and need the added matches that come from a better phrase, you&#8217;re probably going to find it faster and easier just to pull the phrase yourself directly from the content and then perform your own search.</p>
<h4>Conclusions</h4>
<p>Even though the site&#8217;s marketing material says that it is both a competitor and a compliment to Copyscape, Copyscape is by far a more useful service. Though 10DA seems to be about on par with the number of matches Copyscape catches, the usability of Copyscape is much higher and well worth the five cents per search in most cases.</p>
<p>Still, if you&#8217;re looking to do a quick plagiarism check of an article before you post it on your site, something my wife has to do as her company&#8217;s blog editor, it might be a useful service. If you don&#8217;t feel like setting up a Copyscape account or don&#8217;t mind the extra step of visiting the results, then it could be useful.</p>
<p>However, I can not recommend this service for checking for duplicate content of your site&#8217;s material. You can get more accurate matches by hand and the amount of energy that is saved by using the 10DA checker is pretty minimal. Even the free version of Copyscape provides good matching and a much higher usability.</p>
<p>But even that seems somewhat defeatist. With <a href="http://fairshare.cc">Fairshare</a> bringing <a href="http://www.plagiarismtoday.com/2009/02/03/attributor-announces-fairshare-service/">professional-grade matching technology and automatic updates to bloggers</a>, there is no reason that bloggers or other RSS providers should be punching in their articles by hand to check for plagiarism.</p>
<p>Static content may have different needs, but with <a href="http://www.plagiarismtoday.com/2008/01/24/video-how-to-use-google-alerts/">Google Alerts</a> and <a href="http://www.plagiarismtoday.com/2008/07/01/bitscan-release-copy-alerts/">CopyAlerts</a>, there is little reason to manually check those results either.</p>
<p>In short, the age of copying and pasting textual content to see where it has appeared on the Web is fast ending. That is good news though as the easier to use and more automated the systems become, the more likely bloggers and other writers are to use them.</p>
<p>Hopefully, similar systems for images, audio and video are also fast coming.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2009/02/10/10-dollar-articles-plagiarism-checker/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Weekend Linkroll &#8211; 04-26-08</title>
		<link>http://www.plagiarismtoday.com/2008/04/26/weekend-linkroll-04-26-08/</link>
		<comments>http://www.plagiarismtoday.com/2008/04/26/weekend-linkroll-04-26-08/#comments</comments>
		<pubDate>Sat, 26 Apr 2008 15:50:15 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Linkblog]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[fashion]]></category>
		<category><![CDATA[lessig]]></category>
		<category><![CDATA[limbaugh]]></category>
		<category><![CDATA[linblog]]></category>
		<category><![CDATA[linkroll]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[MPAA]]></category>
		<category><![CDATA[msn]]></category>
		<category><![CDATA[Orphan Works]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[professor lessig]]></category>
		<category><![CDATA[RIAA]]></category>
		<category><![CDATA[rush limbaugh]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=982</guid>
		<description><![CDATA[It was yet another busy week for copyright news with the new orphan works legislation, big developments on the RIAA front, The Pirate Bay passing a milestone and much, much more. ]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/msn-music-logo.jpg" alt="" title="msn-music-logo" width="169" height="44" class=" picleft alignleft size-full wp-image-983" />The &#8220;new again&#8221; orphan works legislation dominated this week&#8217;s copyright news, however, there were still plenty of other stories to cover. </p>
<p>We have multiple tales from the RIAA and their battle against file sharing, news that P2P is no longer the big bandwidth hog, Microsoft destroying DRM keys and a fashion designer suing a charity operation over a drawing of a handbag. </p>
<p>Finally, in weird copyright news, we have Professor Lessig drawing the ire of Rush Limbaugh over his tastes in mashups and watermarked art making its way to the cover of a brand- new Wii game. </p>
<p>Remember, as usual, this week&#8217;s linkroll is a &#8220;raw&#8221; link list. Some stories are duplicated, some do not point to their original sources and some may not be accurate. A great deal of refining goes into producing the show notes for the Copyright 2.0 Show.</p>
<p><span id="more-982"></span><br />
<script src="http://www.diigo.com/roll2/linkrolls?username=plagiarismtoday&amp;count=50&amp;style=customize&amp;icon=false&amp;l_type=0&amp;t_color=920D02&amp;t_fam=Verdana,sans-serif&amp;t_size=14&amp;t_bold=true&amp;t_italic=false&amp;t_underline=false&amp;i_fam=Verdana,sans-serif&amp;i_color=920D02&amp;i_size=12&amp;i_bold=false&amp;i_italic=false&amp;i_underline=false&amp;bg_color=FFFFFF&amp;bg_repeat=no-repeat&amp;title=Week%20Ending%2004-26-08&amp;tags=56&amp;bg_img=" type="text/javascript"></script><noscript>Your RSS reader/browser does not support JavaScript, please click through for the full article.</noscript></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/04/26/weekend-linkroll-04-26-08/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search Engine Showdown: Testing Plagiarism Detection</title>
		<link>http://www.plagiarismtoday.com/2008/02/14/search-engine-showdown-testing-plagiarism-detection/</link>
		<comments>http://www.plagiarismtoday.com/2008/02/14/search-engine-showdown-testing-plagiarism-detection/#comments</comments>
		<pubDate>Thu, 14 Feb 2008 20:02:34 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[ask]]></category>
		<category><![CDATA[clusty]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[msn]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism-detection]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[searching]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2008/02/14/search-engine-showdown-testing-plagiarism-detection/</guid>
		<description><![CDATA[How does your search engine do when detecting plagiarism? I've tested five different search engines and how many matches of my work they find. These tests provide some surprising results and offer insight into which services are best for plagiarism detection. ]]></description>
			<content:encoded><![CDATA[<p><img SRC="http://img.skitch.com/20080214-kwq939meab6bpem3aq9yyfntwa.png" align="left" class="picleft"/>Though Google is by far the search leader in most of the World, it is not the only tool available. Other companies provide similar services and many of them have strong followings.</p>
<p>However, determining which search engine is &#8220;better&#8221; for most kinds of searching is difficult. It doesn&#8217;t matter how many results a search engine returns, but rather, how relevant and helpful those results are. That is a very subjective standard and one person could be very happy with a result while the other completely unsatisfied.</p>
<p>But there is one area where more is generally seen as better, plagiarism detection. When looking for copies of your own work, you want to find as many matches as possible, regardless of the order they are put in.</p>
<p>Unfortunately, that is a fairly specialized area of detection and one that not all search engines are well equipped for. Your typical search engine user looks for simple keywords and general information, not long phrases and specific copies. As such, most search engines don&#8217;t invest heavily in improving these types results.</p>
<p>But for those interested in searching for the content and detecting plagiarism, this raises the question &#8220;Which search engine is the best?&#8221; To help decide that, I put five of the top search engines through a battery of tests and the results were surprising.<br />
<span id="more-819"></span></p>
<h4>The Tests</h4>
<p>To cover as much of the market as possible, I decided to test the four top search engines, Google, Yahoo!, MSN and Ask, as well as an upstart meta search engine, Clusty.</p>
<p>I first tested their ability to detect static content that was copied. I ran a total of nine search results through each of the search engines, each used a statistically improbably phrase from different works I&#8217;ve written over the years and measured the number of results returned.</p>
<p>The first three searched dealt with poetry, which used shorter phrases and involved very high amounts of plagiarism/copying. Then I tested using three short stories, which have significantly less copying but longer phrases. Finally, I tested with three shorter prose pieces, such as essays, that are average in both copying and in phrase length.</p>
<p>When searching for the phrases, I included all supplemental results and included different pages on the same site since they are often separate infirngements. I did, however, browse through the results in order to seek out and eliminate any sites that contained the phrase I searched for, but not the actual work. That turned out to be unnecessary though as all results were relevant.  </p>
<p>After those tests, I did one final search to test the ability of the search engines to detect plagiarism of dynamic content by using my <a href="http://www.plagiarismtoday.com/2006/10/05/update-digital-fingerprint-plugin-beta-2/">Digital Fingerprint</a>. </p>
<p>These tests are not scientific and are not designed to be the end all of search engine effectiveness in this area. Rather, it is just a quick and dirty look at how many results they return for searches for my own content and it is designed to give an idea of which search engines might be worth considering.</p>
<p>Your results may vary.</p>
<h4>Static Content</h4>
<p>The results of the static content searches are below:</p>
<table cellspacing=10>
<tr>
<td></td>
<td><strong>Google</strong></td>
<td><strong>Yahoo!</strong></td>
<td><strong>MSN</strong></td>
<td><strong>Ask</strong></td>
<td><strong>Clusty</strong></td>
<td><strong>Winner</strong></td>
</tr>
<tr>
<td><strong>Poem 1</strong></td>
<td>72</td>
<td>81</td>
<td>29</td>
<td>18</td>
<td>42</td>
<td>Yahoo!</td>
</tr>
<tr>
<td><strong>Poem 2</strong></td>
<td>29</td>
<td>25</td>
<td>18</td>
<td>10</td>
<td>27</td>
<td>Google</td>
</tr>
<tr>
<td><strong>Poem 3</strong></td>
<td>21</td>
<td>29</td>
<td>10</td>
<td>6</td>
<td>14</td>
<td>Yahoo!</td>
</tr>
<tr>
<td><strong>Story 1</strong></td>
<td>0</td>
<td>2</td>
<td>2</td>
<td>0</td>
<td>2</td>
<td>Y/M/C</td>
</tr>
<tr>
<td><strong>Story 2</strong></td>
<td>0</td>
<td>3</td>
<td>2</td>
<td>2</td>
<td>4</td>
<td>Clusty</td>
</tr>
<tr>
<td><strong>Story 3</strong></td>
<td>0</td>
<td>3</td>
<td>2</td>
<td>1</td>
<td>2</td>
<td>Yahoo!</td>
</tr>
<tr>
<td><strong>Prose 1</strong></td>
<td>	8</td>
<td>8</td>
<td>5</td>
<td>4</td>
<td>10</td>
<td>Clusty</td>
</tr>
<tr>
<td><strong>Prose 2</strong></td>
<td>12</td>
<td>5</td>
<td>9</td>
<td>2</td>
<td>12</td>
<td>G/C</td>
</tr>
<tr>
<td><strong>Prose 3</strong></td>
<td>19</td>
<td>16</td>
<td>8</td>
<td>7</td>
<td>12</td>
<td>Google</td>
</tr>
<tr>
<td colspan=6></td>
</tr>
<tr>
<td><strong>Totals</strong></td>
<td>161</td>
<td>172</td>
<td>85</td>
<td>50</td>
<td>125</td>
<td>Yahoo!</td>
</tr>
<tr>
<td><strong>Rounds Won/Tied</strong></td>
<td>3</td>
<td>5</td>
<td>1</td>
<td>0</td>
<td>4</td>
<td>Yahoo!</td>
</tr>
</table>
<p>When looking at the results, several things become clear. First is that Yahoo! was the winner, both in terms of number of rounds won and in terms of the total number of results returned. Google performed very well in all of the tests, save those for the short stories. This is interesting because Google failed to even return my own site as a result on all three occasions and no amount of tweaking the phrase (removing punctuation, shortening, etc.) seemed to help. Quick tests for other short stories on my sites had similar results, either returning just one or no hits.</p>
<p>The surprise of the test results was Clusty, which performed reasonably well in all tests, and even won or tied in more rounds than Google. Though its total results returned was significantly lower than both Yahoo! and Google, it established itself as a top-tier candidate in this area.</p>
<p>MSN and Ask, however, performed poorly all around. MSN returned less than half the results of Yahoo! and Ask returned less than one third. The best MSN could do was tie in one of the story rounds while Ask was unable to win or tie in any of the searches.</p>
<p>In the end, Google and Yahoo! finished neck and neck, with Yahoo! taking a slight lead due to Google&#8217;s strange performance in the story rounds. Clusty is right behind both of them and MSN and Ask are both left in the dust.</p>
<h4>Dynamic Content</h4>
<p>I decided to do a similar test using my digital fingerprint, as string of semi-random eight characters, to see which search engine could find more copies of my content from this site on the Web. In light of the previous results, these were very shocking. </p>
<table cellspacing=10>
<tr>
<td></td>
<td><strong>Google</strong></td>
<td><strong>Yahoo!</strong></td>
<td><strong>MSN</strong></td>
<td><strong>Ask</strong></td>
<td><strong>Clusty</strong></td>
<td><strong>Winner</strong></td>
</tr>
<tr>
<td><strong>Digital Fingerprint</strong></td>
<td>297</td>
<td>6</td>
<td>6</td>
<td>48</td>
<td>19</td>
<td>Google</td>
</tr>
</table>
<p>In terms of pure count, Google blew away the competition scoring over six times the number of results found than its nearest competitor, Ask. With almost 300 results, Google stood tall in this division. Ask was able to locate about 48 results, Clusty managed to find 19 copies but Yahoo! and MSN both were only able to locate six pages with my digital fingerprint.</p>
<p>However, this trouncing comes with a caveat. Google&#8217;s results were filled to the brim with pages that, most likely, did not belong in the search engine index at all. Aside from spam blogs and other regular sources of content theft, there were a lot of legitimate sources such as other search engines, blog indexes and caches that should not have had their sites indexed by Google. </p>
<p>It is going to take a more thorough analysis to see if Google actually caught more scraping or if they are simply indexing more sources of legitimate reuse. </p>
<p>Though I agree more is typically better, it seems likely to me that the Google result is highly inflated and the sites I visited seem to indicate that.</p>
<h4>&#8220;Bonus&#8221; Round</h4>
<p>In one final test, I ran a line from Shakespeare&#8217;s &#8220;Hamlet&#8221; into the five search engines to see how many copies of the work it detected. Once again, the results were surprising.</p>
<table cellspacing=10>
<tr>
<td></td>
<td><strong>Google</strong></td>
<td><strong>Yahoo!</strong></td>
<td><strong>MSN</strong></td>
<td><strong>Ask</strong></td>
<td><strong>Clusty</strong></td>
<td><strong>Winner</strong></td>
</tr>
<tr>
<td><strong>Hamlet</strong></td>
<td>15800</td>
<td>27000</td>
<td>7330</td>
<td>2030</td>
<td>2010</td>
<td>Yahoo!</td>
</tr>
</table>
<p>This time around, the results more closely mirrored the first round of testing but instead showing Yahoo! with a much larger advantage. The other three, were left in the dusty with Clusty, this time, picking up the rear. </p>
<p>It is important to note though that, with these kinds of numbers, the results counters are notoriously unreliable and this test was not intended to be serious in any way. It was merely a curiosity that drove me to see how one of the most copied works in history was being used on the Web.</p>
<p>The one thing it did prove conclusively is that, no matter what, I am no Shakespeare. </p>
<h4>Caveats</h4>
<p>A mentioned earlier, these tests were not designed to be scientific, but rather, were simply &#8220;quick and dirty&#8221; checks to see how the different search engines handled the same queries. Different people will likely get different results and the conclusions that one can draw from this are, admittedly, very narrow.</p>
<p>Any researchers interested in replicating my study can <a href="http://www.plagiarismtoday.com/contact-pt/">contact me</a> and I&#8217;ll send them the exact queries used and works checked for. I don&#8217;t wish to post them here as doing so could further skew future tests should this post be scraped.</p>
<p>All in all, the goal of this test is not to set any rules about who to trust with your plagiarism searches, but rather, where to start looking.</p>
<h4>Conclusions</h4>
<p><img SRC="http://img.skitch.com/20080214-fbbfky5dxp9wtxfn9q641b4c54.png" align="right" class="picright"/>Personally, looking at these test results, I would say that both Google and Yahoo! are acceptable solutions for detecting static content. However, since Google offers <a href="http://www.google.com/alerts">Google Alerts</a>, an email service that automates most search functions, it is safe to say that, for most, it will remain the default for plagiarism searching, even though Yahoo! produced slightly more results.</p>
<p>However, there is obviously a benefit to broadening your search horizons and incorporating both Yahoo! and Clusty into your efforts. Both sites produced better results than Google on many tests and often caught sites that Google missed. </p>
<p>Another thing that becomes clear is that MSN and Ask, though likely great search engines in other regards, both lag far behind in this area and in none of my searches did I notice Ask or MSN picking up copies that were clearly missed by the other sites. They might be worth experimenting with further, but are likely not the best places to start.</p>
<p>In the end, these tests seem to have achieved their goals by giving us an idea of where to begin and what services we should likely avoid. Your mileage may vary, but these results do seem to speak for themselves. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/02/14/search-engine-showdown-testing-plagiarism-detection/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The DMCA on 7 Blog Hosts</title>
		<link>http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/</link>
		<comments>http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/#comments</comments>
		<pubDate>Thu, 06 Sep 2007 20:07:09 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[DMCA Seven]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[AOL]]></category>
		<category><![CDATA[Blogger]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[blogsome]]></category>
		<category><![CDATA[Blogspot]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Livejournal]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[msn]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[sixapart]]></category>
		<category><![CDATA[typepad]]></category>
		<category><![CDATA[windows-live-spaces]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/</guid>
		<description><![CDATA[For the next chapter in the &#8220;DMCA Seven&#8221; series, we&#8217;re taking a look at one of the most common types of hosts out there, blog hosts. Many of these hosts have been copyright headaches for Webmasters. They are prime targets for spam blogs and scrapers and some have played a huge role in rise of...]]></description>
			<content:encoded><![CDATA[<p>For the next chapter in the &#8220;DMCA Seven&#8221; series, we&#8217;re taking a look at one of the most common types of hosts out there, blog hosts. </p>
<p>Many of these hosts have been copyright headaches for Webmasters. They are prime targets for spam blogs and scrapers and some have played a huge role in rise of the &#8220;splogosphere&#8221;. </p>
<p>Without the help of these hosts, the copyrights of bloggers will be almost impossible to protect, especially for smaller rightsholders that can not afford attorneys to go after plagiarists. That makes the DMCA/copyright policies of these hosts a matter of critical importance to the rest of us on the Web.</p>
<p>So how do they mneasure up? Let&#8217;s take a look at seven of the leaders and find out. </p>
<p><span id="more-624"></span><a href='http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/blogger2png/' rel='attachment wp-att-632' title='blogger2.png'><img src='http://www.plagiarismtoday.com/wp-content/uploads/2007/09/blogger2.png' alt='blogger2.png' /></a></p>
<p><strong>Format:</strong> Email<br />
<strong>Email Address:</strong> amac at google dot com<br />
<strong>Location of Policy:</strong> <a href="http://www.google.com/dmca.html">Google&#8217;s DMCA Policy</a><br />
<strong>Registered with USCO:</strong> <a href="http://www.copyright.gov/onlinesp/agents/google.pdf">Yes</a><br />
<strong>Comments:</strong> Every time I do one of these articles, it seems that Google owns one of the seven leading properties and I&#8217;m forced to cover them again and rehash the same complaints. To summarize, their requirement of a handwritten signature needlessly complicates the process of filing a DMCA complaint and, most likely, does not comply with the law, in particular the <a href="http://www.ftc.gov/os/2001/06/esign7.htm">ESIGN Act</a>. Until they are able to accept email complaints without PDF trickery, they will be a major thorn in Webmaster&#8217;s sides and, in this case, a great target for spam bloggers/scrapers.<br />
<strong>Grade:</strong> D</p>
<p><a href='http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/livespacespng/' rel='attachment wp-att-629' title='livespaces.png'><img src='http://www.plagiarismtoday.com/wp-content/uploads/2007/09/livespaces.png' alt='livespaces.png' /></a></p>
<p><strong>Format:</strong> Email<br />
<strong>Email Address:</strong> jkweston at microsoft dot com<br />
<strong>Location of Policy:</strong> <a href="http://www.microsoft.com/info/cpyrtInfrg.htm">Microsoft&#8217;s Copyright Policy</a><br />
<strong>Registered with USCO:</strong> <a href="http://www.copyright.gov/onlinesp/agents/msft.pdf">Yes</a><br />
<strong>Comments:</strong> Be grateful you read this site and be sure to bookmark this article. Otherwise, the odds of you quickly finding the copyright policy for MSN Live Spaces are slim to one. Their <a href="http://support.live.com/default.aspx?productKey=wlspacesabuse&#038;mkt=en-ww">report abuse page</a> gives you a drop down to report copyright infringement. However, sending a report there, even with a full DMCA notice, only results in an autoreply directing you to follow the guide above (as reported by visitors of this site). You can also find a mention of the policy on their <a href="http://tou.live.com/en-us/default.aspx">legal page</a>, but the link to the actual policy isn&#8217;t even clickable. You have to literally copy and paste the URL to use it. The policy itself is fairly complete, containing all of the necessary information, but finding it is a pain. You can&#8217;t even do a Google search for it. It is as if Microsoft is deliberately hiding this page. It may not be illegal, but it is some of the worst service I have ever seen.<br />
<strong>Grade:</strong> C-</p>
<p><a href='http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/aolpeoplepng/' rel='attachment wp-att-625' title='aolpeople.png'><img src='http://www.plagiarismtoday.com/wp-content/uploads/2007/09/aolpeople.png' alt='aolpeople.png' /></a></p>
<p><strong>Format:</strong> <a href="http://about.aol.com/aolnetwork/info_notify">Form</a>/Email<br />
<strong>Email Address:</strong> aolcopyright at aol dot com<br />
<strong>Location of Policy:</strong> <a href="http://about.aol.com/aolnetwork/info_notify">AOL&#8217;s Copyright Infringement Policy</a><br />
<strong>Registered with USCO:</strong> <a href="http://www.copyright.gov/onlinesp/agents/aol.pdf">Yes</a><br />
<strong>Comments:</strong> AOL may be considered a dinosaur on the Web, but they do exhibit some forward thinking in this area. They provide a very convinient from (linked above) for filing complaints of copyright infringement that takes care of most of the dirty work for you. It&#8217;s definitely one of the easiest ways to report an infringement I&#8217;ve seen. The only problem with AOL&#8217;s policy is that it is also very hard to find. To get to that form you have to find a link buried in their <a href="http://about.aol.com/aolnetwork/info_notify">terms of use</a> to get to their infringement policy and, from there, click their &#8220;Copyright Notice&#8221; link to get to the form itself. They also don&#8217;t provide an email address, save on their USCO registration but otherwise is complete, giving all of the necessary information to file a notice. Though I am irked by the location of the policy and the roadblocks in finding it, it is overall a solid way to handle such matters.<br />
<strong>Grade:</strong> B+</p>
<p><a href='http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/wordpresslogopng/' rel='attachment wp-att-631' title='wordpresslogo.png'><img src='http://www.plagiarismtoday.com/wp-content/uploads/2007/09/wordpresslogo.png' alt='wordpresslogo.png' /></a></p>
<p><strong>Format:</strong> Email<br />
<strong>Email Address:</strong> dmca at automattic dot com<br />
<strong>Location of Policy:</strong> <a href="http://automattic.com/dmca/">Automattic&#8217;s DMCA Policy</a><br />
<strong>Registered with USCO:</strong> No<br />
<strong>Comments:</strong> Though Automattic is <a href="http://www.plagiarismtoday.com/2007/04/09/why-wordpresscom-is-virtually-spam-free/">definitely on the offensive against spam</a>, their DMCA policy leaves a lot to be desired. While the actual policy is very complete, though omitting a fax number, finding it is a pain. The link to it is buried in the terms of service and they have not registered with the USCO meaning that there is no alternative way to look up the information. Though they have a great reputation for handling such issues once notified, their front end and user-friendliness could definitely use some work. Though the policy is likely within the bounds of the law, the lack of USCO registration and the difficult location discourage me greatly and raise some potential legal issues.<br />
<strong>Grade:</strong> D+</p>
<p><a href='http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/typepadlogopng/' rel='attachment wp-att-630' title='typepadlogo.png'><img src='http://www.plagiarismtoday.com/wp-content/uploads/2007/09/typepadlogo.png' alt='typepadlogo.png' /></a></p>
<p><strong>Format:</strong> Email<br />
<strong>Email Address:</strong> copyright at typepad dot com<br />
<strong>Location of Policy:</strong> <a href="http://support.typepad.com/cgi-bin/typepad.cfg/php/enduser/std_adp.php?p_faqid=910">Typepad&#8217;s Copyright Policy</a><br />
<strong>Registered with USCO:</strong> <a href="http://www.copyright.gov/onlinesp/agents/sixapart.pdf">Yes</a><br />
<strong>Comments:</strong> One of two SixApart services we&#8217;re going to cover, Typepad is a breath of fresh air when stacked up to its competitors. The copyright policy is well-written, robust and complete. It is linked at the bottom of most SixApart-controlled pages, including the home page, and is very easy to find. They are also registered with the USCO and their information there is up to date. Though some issues exist on the backend, in particular with other SixApart services, they definitely understand how to comply with the DMCA and make the process as painless as possible for the end user.<br />
<strong>Grade:</strong> B+</p>
<p><a href='http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/livejournallogopng/' rel='attachment wp-att-628' title='livejournallogo.png'><img src='http://www.plagiarismtoday.com/wp-content/uploads/2007/09/livejournallogo.png' alt='livejournallogo.png' /></a></p>
<p><strong>Format:</strong> Email<br />
<strong>Email Address:</strong> copyright at livejournal dot com<br />
<strong>Location of Policy:</strong> <a href="http://www.livejournal.com/legal/dmca.bml">LiveJournalal&#8217;s Copyright Policy</a><br />
<strong>Registered with USCO:</strong> <a href="http://www.copyright.gov/onlinesp/agents/sixapart.pdf">Yes</a><br />
<strong>Comments:</strong> The other SixApart service shares its policy with its brother. In fact, at first glance, the text of the two policies appear to be identical. As with TypePad, the copyright policy is linked at the footer of nearly every LiveJournal-controlled page, including the home page, and offers the same level of completeness. Though I would prefer the policy, along with other abuse information, be available on all pages, including individual blogs, the benefit of that would be minimal as most people will simply visit the home page.<br />
<strong>Grade:</strong> B+</p>
<p><a href='http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/blogsomelogopng/' rel='attachment wp-att-627' title='blogsomelogo.png'><img src='http://www.plagiarismtoday.com/wp-content/uploads/2007/09/blogsomelogo.png' alt='blogsomelogo.png' /></a></p>
<p><strong>Format:</strong> Email?<br />
<strong>Email Address:</strong> legal at blogsome dot com?<br />
<strong>Location of Policy:</strong> <a href="http://www.blogsome.com/termsofservice.php">Blogsome Terms of Service</a><br />
<strong>Registered with USCO:</strong> No<br />
<strong>Comments:</strong> Blogsome talks one of the toughest games when it comes to copyright infringement. Their modest-lengthed terms of service uses the word &#8220;copyright&#8221; ten times. However, there&#8217;s no teeth to this policy. There is no address given to contact about copyright infringement. Though they are an Irish site and do not have to follow the DMCA, there are EU regulations at play and, without a clear means of contact for reporting abuses, I am very worried about how this site might be misused. Hopefully, they will update their policies soon to make it more clear where they should take such matters, the address above is the only account I could find used in relation to the terms of service in any way, shape or form.<br />
<strong>Grade:</strong> F</p>
<p><strong>Wrap Up</strong></p>
<p>It is an Alice in Wonderland moment for me. SixApart, who has <a href="http://www.plagiarismtoday.com/2007/04/03/six-apartrojo-now-spam-bloggers/">drawn criticism for their copyright polices elsewhere</a>, has the best public DMCA policies of all the services listed. WordPress, the dedicated spam fighter, has one of the worst.</p>
<p>Equally strange, the big companies, Microsoft and Google, both have obstructionist policies while the dinosaur AOL and small business SixApart have much more effective and open ones. </p>
<p>Over all though, I was very disappointed in the policies of the major blog hosts. Most have shown little interest working with rightsholders and some are downright uncooperative. With one F, two Ds, a C- and three B+s, two belonging to SixApart, this is easily one of the worst genres I have reviewed.</p>
<p><strong>Conclusions</strong></p>
<p>In the end, none of the blogging companies earned an A. They all had at least some issues with their policy and, despite being some of the largest, most important hosts on the Web, only one showed any kind of evolution or forward-thinking in this area. </p>
<p>When stacked against the <a href="http://www.plagiarismtoday.com/2007/08/30/the-dmca-on-7-video-sites-youtube-beats-viacom/">video sharing sites</a>, it becomes clear what a difference the threat of lawsuits has on a company. Where the video sharing sites have taken great care in crafting their policies, the blog hosts have largely just thrown theirs together, working only to meet the minimum standards they feel they should be held up to.</p>
<p>Looking at these policies and going back over my personal experiences with many of these hosts, there is no wonder why and how the spam blog problem grew to the proportions that it did. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/09/06/the-dmca-on-seven-blog-hosts/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 02:06:44 -->
