<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism TodaySearch-Engines | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/search-engines/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 17:55:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>3 Count: MegaDeletion</title>
		<link>http://www.plagiarismtoday.com/2012/01/30/3-count-megadeletion/</link>
		<comments>http://www.plagiarismtoday.com/2012/01/30/3-count-megadeletion/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 15:18:41 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Copyright News]]></category>
		<category><![CDATA[6waves]]></category>
		<category><![CDATA[bing]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[deletion]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[megaupload]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[spry fox]]></category>
		<category><![CDATA[triple town]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[yeti town]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=12468</guid>
		<description><![CDATA[Megaupload's data could be deleted this week, UK government asks search engines to de-rank pirate sites and game plagiarism leads to a lawsuit.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/07/3count004-trim.png" alt="" title="3count004-trim" class="alignleft size-full wp-image-7303" height="162" width="175"></p>
<p><em>Have any suggestions for the 3 Count? Let me know via Twitter <a href="http://twitter.com/plagiarismtoday">@plagiarismtoday</a>.</em></p>
<h4>1: <a href="http://www.washingtonpost.com/business/technology/megaupload-data-could-be-deleted-starting-thursday/2012/01/30/gIQAeggGcQ_story.html">Megaupload Data Could be Deleted Starting Thursday</a></h4>
<p>First off today, according to government sources, Megaupload&#8217;s data, including the files of its millions of users, could be deleted by Thursday. The move comes as the government says that it is done accessing the data that formerly made up Megaupload&#8217;s site and the contractors that were helping host the site can not be paid as the company&#8217;s assets are frozen. As such, they&#8217;ve asked permission to delete the data to make room for other customers. However, Megaupload&#8217;s attorney is hoping to find a way both use the files in Megaupload&#8217;s defense and to make the files available again to the users who uploaded them, so they won&#8217;t be lost forever.</p>
<h4>2: <a href="http://torrentfreak.com/copyright-industry-calls-for-broad-search-engine-censorship-120127/">Copyright Industry Calls For Broad Search Engine Censorship</a></h4>
<p>Next up today, in the UK, the Department for Culture, Media and Sport gave a proposal to the major search engines including Google, Bing and Yahoo to ask them to de-rank or remove search results for sites that routinely infringe copyright and to bolster the ranking of legitimate sites. According to the proposal, such an effort would already be in line with their existing policies on favoring sites meet certain quality standards. Critics of the proposal, however, are calling this a form of search engine censorship. However, the proposal is non-binding at this time and is merely a suggested set of guidelines for the search engines to follow. </p>
<h4>3: <a href="http://www.gamezebo.com/news/2012/01/29/triple-town-developer-files-copyright-infringement-suit-over-yeti-town">Triple Town developer Files Copyright Infringement Suit Over Yeti Town</a></h4>
<p>Finally today, the gaming company Spry Fox has filed a copyright infringement suit against competing game maker 6Waves LOLAPPS over a case of alleged plagiarism. According to Spry Fox, they were in talks with 6Waves LOLAPPS to have the latter publish a game created by Spy Fox named &#8220;Triple Town&#8221;, which was already a popular Facebook game. However, according to Spry Fox, 6Waves LOLAPPS took the opportunity to develop a clone of the game named &#8220;Yeti Town&#8221; and then publish it to the iOS App Store, beating Spry Fox to that market. Finally, also according to Spry Fox, 6Waves LOLAPPS sent a message via Facebook to break off negotiations about Triple Town the day Yeti Town was published.</p>
<h4>Suggestions</h4>
<p>That&#8217;s it for the three count today. We will be back tomorrow with three more copyright links. If you have a link that you want to suggest a link for the column or have any proposals to make it better. Feel free to leave a comment or send me an email. I hope to hear from you. </p>
<h4>Want the Full Story?</h4>
<p>Tune in <a href="http://www.plagairsimtoday.com/podcast">every Wednesday evening at 5 PM ET for the live recording of the Copyright 2.0 Show</a> or wait and get the edited version <a href="http://www.plagiarismtoday.com/category/podcast/">Friday right here on Plagiarism Today</a>. </p>
<p><em>The 3 Count Logo was created by <a rel="nofollow" href="http://www.cloudjunkies.com/">Justin Goff</a> and is licensed under a <a rel="nofollow" href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution License</a>. </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2012/01/30/3-count-megadeletion/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Can Blekko Help Detect Copied Content?</title>
		<link>http://www.plagiarismtoday.com/2010/11/30/can-blekko-help-detect-copied-content/</link>
		<comments>http://www.plagiarismtoday.com/2010/11/30/can-blekko-help-detect-copied-content/#comments</comments>
		<pubDate>Tue, 30 Nov 2010 18:07:59 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[blekko]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[duplicate-content]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[seo]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=8426</guid>
		<description><![CDATA[The new search engine Blekko offers a new way to detect plagiarism and other duplicate content, but how well does it work?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/11/blekko-logo.jpg" alt="Blekko Logo Image" title="Blekko Logo Image" width="181" height="53" class="alignleft size-full wp-image-8448" /><a href="http://blekko.com/">Blekko</a> is a new search engine that is aiming at the search leaders, including Google, by offering a more open and more spam-free search experience. With a tagline &#8220;Slash the Web&#8221; <a href="http://blekko.com/ws/+/about">Blekko has laid down an Internet searcher&#8217;s bill of rights</a> that encourages users to create &#8220;slashes&#8221; and that will customize what appears in their results.</p>
<p>For example, if you search for &#8220;Phones&#8221; and add the /android slash you&#8217;ll only get results for related to the Android operating system.  Likewise, you can use slashed to manipulate the results in various ways, including selecting a date range, a political slant or only certain kinds of sites (forums, blogs, etc.).</p>
<p>Two of the more interesting slashes are <a href="http://www.labnol.org/internet/check-online-plagiarism/18120/">/duptext and /domainduptext</a>, which supposedly will check either a page or a domain&#8217;s content to find where it is being duplicated and how it is being misused. For webmasters, this could mean a powerful new tool for tracking duplicate content on the Web and tracking down those who are misusing their work.</p>
<p>So, as with other systems, I put it to the test and was, in a word, disappointed with the results. Though I think Blekko has a lot of potential in other areas, it doesn&#8217;t seem that duplicate content detection is one of its better uses, at least not at this time.<span id="more-8426"></span></p>
<h4>How Blekko&#8217;s Plagiarism Checker Works</h4>
<p>Using Blekko&#8217;s duplicate content detection system is actually fairly easy. All one has to do is search for the URL they want to check the content of and then add the /duptext tag to the end of the URL.</p>
<p>For example:</p>
<blockquote><p><a href="http://blekko.com/ws/http://www.plagiarismtoday.com/2010/11/23/5-rules-for-the-next-plagiarism-scandal/+/duptext">http://www.plagiarismtoday.com/2010/11/23/5-rules-for-the-next-plagiarism-scandal/ /duptext</a></p></blockquote>
<p>You can do this with any page on the Web and the results are usually presented in a few seconds.</p>
<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2010/11/blekk-example.jpg" alt="Blekko Example" title="Blekko Example" width="511" height="192" class="alignnone size-full wp-image-8427" /></p>
<p>As you can see, it breaks out the information by hosts and URLs and, from there, based on those that are on-site and off-site. Below the chart is a list of links where the duplicate content is present.</p>
<p>You can also check an entire domain for duplicate content by looking for just the domain and adding &#8220;/domainduptext&#8221; to the end. For example.</p>
<blockquote><p><a href="http://blekko.com/ws/plagiarismtoday.com+/domainduptext">plagiarismtoday.com /domainduptext</a></p></blockquote>
<p>However, with this slash you get significantly less information, basically just a list of domains where your duplicate content is suspected of appearing and links to their SEO pages. </p>
<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2010/11/blekk-example2-500x193.jpg" alt="Blekko Example 2" title="Blekko Example 2" width="500" height="193" class="alignnone size-large wp-image-8428" /></p>
<p>The question, however, is &#8220;How well does it work?&#8221; Unfortunately, after a few searches, the answer appears to be a disappointing one.</p>
<h4>Testing it Out</h4>
<p>As is typical with my tests, I decided to have Blekko do a duplicate content check on several works with a relatively known amount of plagiarism, 2 poems, one short story and one post on Plagiarism Today.</p>
<p>Here are the results of those tests:</p>
<h4>Test 1: Poem 1</h4>
<p>I tried out Blekko on an old poem of mine that I knew had seen widespread copying, both plagiarized and attributed. However, after performing the search, <a href="http://blekko.com/ws/http://www.ravensrants.com/friends-or-lovers/+/duptext">Blekko failed to find a single copy of the poem on any other site</a>, even though <a href="http://www.google.com/search?sourceid=chrome&#038;ie=UTF-8&#038;q=%22Though+our+voices+could+call+out+in+comfort%22">a simple Google search finds about 40 results</a>, though many are admittedly duplicates.</p>
<p><strong>Blekko Results:</strong> 0 <strong>Google Results:</strong> 40</p>
<h4>Test 2: Poem 2</h4>
<p>Testing with another poem produced very similar results. However, this time <a href="http://blekko.com/ws/http://www.ravensrants.com/ghost-within-my-mind/+/duptext">Blekko didn&#8217;t even find duplicates on my site</a> and instead simply indicated that there were no duplicates at all. However, once again, <a href="http://www.google.com/search?sourceid=chrome&#038;ie=UTF-8&#038;q=%22like+vapors+disappearing+before+my+eyes%22#q=%22like+vapors+disappearing+before+my+eyes%22&#038;hl=en&#038;safe=off&#038;prmd=iv&#038;ei=Y-DzTJH7A8H_lgeqxOmBDQ&#038;start=60&#038;sa=N&#038;filter=0&#038;fp=5831956345d34357">a simple Google search turned up about forty results though</a> though, as with before, many were duplicates or copies on my domain.</p>
<p><strong>Blekko Results:</strong> 0 <strong>Google Results:</strong> 39</p>
<h4>Test 3: Story</h4>
<p>Following the lack of luck with the two poems, I then tried an old short story of mine that had seen a small amount of copying. However, once again, <a href="http://blekko.com/ws/http://www.ravensrants.com/soulripper/+/duptext">Blekko failed to find any results that were not on my domain</a> and <a href="http://www.google.com/search?sourceid=chrome&#038;ie=UTF-8&#038;q=%22However,+by+lunchtime+the+sun+had+come+out+and+most+of+the+snow+had+melted+away.%22#q=%22However%2C+by+lunchtime+the+sun+had+come+out+and+most+of+the+snow+had+melted+away.%22&#038;hl=en&#038;safe=off&#038;prmd=iv&#038;filter=0&#038;fp=5831956345d34357">a quick Google search turned a duplicate of the story</a> on a DeviantArt account.</p>
<p><strong>Blekko Results:</strong> 0 <strong>Google Results:</strong> 1</p>
<h4>Test 4: PT Post</h4>
<p>Finally, I tried an old, popular post from Plagiarism Today to see how well its content was detected. However, once again, <a href="http://blekko.com/ws/http://www.plagiarismtoday.com/2006/12/07/what-porn-can-teach-us-about-piracy/+/duptext">Blekko failed to return any results</a> and <a href="http://www.google.com/search?sourceid=chrome&#038;ie=UTF-8&#038;q=%22No+one+seems+to+be+able+to+go+agree+on+exactly+how+much+of+the+traffic+on+file+sharing%22">Google found a duplicate version of the piece</a> on what appears to be a BlogSpot spam blog (one I was previously unaware of too).</p>
<p><strong>Blekko Results:</strong> 0 <strong>Google Results:</strong> 1</p>
<h4>Test 5: Whole Domain</h4>
<p>Finally, in a bid to see what would happen if I ran my entire old literature domain through Blekko using the /domainduptext slash, it <a href="http://blekko.com/ws/ravensrants.com+/domainduptext">found only 6 offsite domains and 11 offsite URLs</a>, even though many individual pieces see more reuse than that. It was missing many domains with widespread reuse of my work (legitimate and plagiarized) including blogspot.com, myspace.com and deviantart.com to name just a few.</p>
<p>Worse still, I couldn&#8217;t examine any of the individual links as clicking the link provided by Blekko just took me to the SEO page for that domain, not to a list of suspect URLs on the site or even to the domain itself.</p>
<p>In short, if I wanted to find out exactly how my content was used on these sites, it was up to me to find it.</p>
<p><strong>Blekko Results:</strong> 11 <strong>Google Results:</strong> N/A</p>
<h4>Other Issues</h4>
<p>It became pretty clear that Blekko was missing a lot of duplicate content with its searches. My suspicion is that its because it tries to hone in only what it considers the best sites and cuts out spam blogs and other sites it deems to be of low value.</p>
<p>While this may be great for searchers, it creates a real problem when checking for duplicate content as these are often the exact sites you need to find. </p>
<p>However, that can&#8217;t be the only cause of the problem. <a href="http://blekko.com/ws/%22Though+our+voices+could+call+out+in+comfort%22">If you use Blekko to do search for quotes from the relevant pieces</a>, you get much more respectable results. Though the results aren&#8217;t nearly as good as Google in this area, they are definitely much more useful than via either of the slashes.</p>
<p>But the biggest problem is what one does after they find content reuse via Blekko. With the /domainduptext slash you can&#8217;t even access the individual URLs to investigate further. Using the /duptext slash is a much more robust tool, taking you to a page where the duplicate content is highlighted, but in the pages I did check the results were hit and miss, as many as half of the pages linked had no duplicate content at all.</p>
<p>All in all, as useful as Blekko is for other kinds of searches, or at least as useful as it might be, it doesn&#8217;t handle duplicate content searches very well, certainly no better than Copyscape or even regular Google.</p>
<h4>Bottom Line</h4>
<p>None of this is meant to be a slight against Blekko in any other regard. The other searches I did with it were actually pretty useful and, though I wasn&#8217;t swayed enough to change my default search engine, I did enjoy a lot of what Blekko had to offer and can see myself making some slashes for my use.</p>
<p>In the end though, it just isn&#8217;t a good tool for detecting plagiarism, copyright infringement or other kinds of duplicate content. Though the idea is solid and its integration with other SEO functions very appealing to some, it just isn&#8217;t accurate or complete enough at this time.</p>
<p>Still, as with  other tools I&#8217;ve reviewed, there is hope for the future. But it remains to be seen if this will be a priority for Blekko, which is clearly targeting a more generic search audience. Duplicate content detection is a high-specialized skill and the tools to find a keyword on the Web aren&#8217;t the same as the ones to find an article on every site it appears. </p>
<p>As such, this will most likely remain a nice idea by a decent search engine that just isn&#8217;t practical.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2010/11/30/can-blekko-help-detect-copied-content/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>New Ruling on Proper DMCA Takedowns</title>
		<link>http://www.plagiarismtoday.com/2010/07/29/new-ruling-on-proper-dmca-takedowns/</link>
		<comments>http://www.plagiarismtoday.com/2010/07/29/new-ruling-on-proper-dmca-takedowns/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 19:56:27 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[eff]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[perfect 10]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Search-Engines]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=7400</guid>
		<description><![CDATA[Perfect 10 suffered yet another blow in court and the ruling is one that all DMCA filers need to pay close attention to, lest their notices be ignored.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/07/perfect-10-logo.jpg" alt="" title="perfect-10-logo" width="253" height="50" class="alignleft size-full wp-image-7403" /></p>
<p>Repeat copyright litigator and pornography company Perfect 10 <a href="http://www.eff.org/deeplinks/2010/07/perfect-10-v-google-round-3-goes-google-no-sloppy">has had yet another verdict go against it</a>,  this one in its ongoing battle with Google over what Perfect 10 alleges is images infringing on their copyright present in Google&#8217;s index.</p>
<p>According to Perfect 10, Google has not done enough to remove such infringing images from their index and has not responded appropriately to DMCA takedown notices. The court, however, has largely disagreed.</p>
<p>The court tossed out most of Perfect 10 claims saying that they did not provide adequate notice for Google nor did they meet the minimum standards underneath the law. In short, their DMCA notices were not adequate and Google can not be held liable for not taking action on them. </p>
<p>A small subset of Perfect 10&#8242;s notices were deemed to be valid and Google now must show that it responded expeditiously to remove the infringing material or potentially face liability.</p>
<p>It has already been a case that all DMCA filers need to watch closely and this recent verdict only reaffirms that. Fortunately, the lessons from this ruling are very simple to understand.</p>
<h4>Lessons For DMCA Filers</h4>
<p>The basic lesson from this ruling is very straightforward, don&#8217;t file sloppy DMCA notices. You, as the copyright holder, can not place the burden on the host or the search engine to do the research on your claim nor can you you simply dump a collection of links and source items on the host&#8217;s doorstep for them to sort through and figure it out.</p>
<p>Perfect 10 filed at least some of its notices by including a cover letter and a spreadsheet of URLs. Often times the URLs did not link to the infringing material and the source content was not clearly identified or was one of thousands of images on a DVD.</p>
<p>Essentially, the court ruled that a proper DMCA notice needs to include all the required information in a single written communication and could not force any undue burden on the host. This seems reasonable enough, but Perfect 10 did not do that with the majority of its notices.</p>
<p>However, in a regard, this recent Perfect 10 ruling is a victory for DMCA filers as the court upheld properly filed notices and may still find Google liable in those cases depending on what action Google took. However, it is more importantly a word of caution that, when filing DMCA notices, you need to make sure that they are sent in the proper format or they can be legally ignored.</p>
<h4>What It Means For You</h4>
<p>If you use the <a href="http://www.plagiarismtoday.com/stock-letters/">stock letters</a> provided here and <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/">the system I preach</a>, it means pretty much nothing.</p>
<p>In fact, the steps required by the court, making the information clear, providing all the data in one communication, etc. are all good practices regardless. If one is interested in seeing the DMCA notice executed in a timely manner, these are the steps one should take anyway. </p>
<p>I&#8217;ve never had a DMCA notice rejected as being incomplete using the current template and don&#8217;t see any reason I would following this ruling.</p>
<p>This ruling should serve as a warning to those who might play games with DMCA notices in a bid to &#8220;trap&#8221; hosts into non-compliance, but to those who are filing notices in good faith and working to make things as efficient as possible for all parties, they have nothing to fear.</p>
<h4>Bottom Line</h4>
<p>Clearly, the Perfect 10 case is ongoing and we will see the outcome on the remaining notices. But with the majority of the works now tossed out, it is clear that Perfect 10 will never see the full outcome it wants.</p>
<p>But while that may concern some DMCA filers, the truth is that Perfect 10&#8242;s method of filing notices was so out of the norm for how the process is supposed to work that the court found it to be invalid.</p>
<p>In short, even if you don&#8217;t do things exactly like me, you&#8217;ll probably find that you&#8217;ll be on the right side of the law so long as you work to make things as simple and clear as possible for the company you are filing with and don&#8217;t try to reinvent the wheel.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2010/07/29/new-ruling-on-proper-dmca-takedowns/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Updating &#8220;Stopping Internet Plagiarism&#8221; Part 2</title>
		<link>http://www.plagiarismtoday.com/2010/04/07/updating-stopping-internet-plagiarism-part-2/</link>
		<comments>http://www.plagiarismtoday.com/2010/04/07/updating-stopping-internet-plagiarism-part-2/#comments</comments>
		<pubDate>Wed, 07 Apr 2010 18:26:05 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Housekeeping]]></category>
		<category><![CDATA[bing]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=6296</guid>
		<description><![CDATA[The second and final part of the update to the "Stopping Internet Plagiarism" section is now done. Here are the major changes.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/03/weak-link.jpg" alt="" title="weak-link" width="245" height="159" class="alignleft size-full wp-image-6215"></p>
<p><a href="http://www.plagiarismtoday.com/2010/03/31/updating-stopping-internet-plagiarism-part-1/">As I mentioned last week</a>, the &#8220;<a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/">Stopping Internet Plagiarism</a>&#8220;, has been a cornerstone of the site since day one, explaining the ins and outs of fighting plagiarism and other misuse of your content, but it has also fallen into disrepair due to age and neglect.</p>
<p>Last week, I began the process of overhauling and repairing the damage. I fixed the main SIP page as well as the first three chapters. Today I&#8217;ve gone through and done the same for the last three, namely the <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/4-contacting-the-host/">Contacting the Host</a>, <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/6-when-all-else-fails/">When All Else Fails</a> and <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/the_long_haul/">The Long Haul</a> pages.</p>
<p>In these chapters the changes were much more minor than in the first as there hasn&#8217;t been many significant changes in the law or technology in these areas. Instead, most of the changes were to the formatting, grammar and for clarification.</p>
<p>As with the first round, I wouldn&#8217;t call these updates a complete rewrite, most of the original text is still there, even more so in this case.</p>
<p>The changes included in this update are:</p>
<ol>
<li>Updated the Contacting the Host page to better integrate in the stock letters.</li>
<li>Removed the section about &#8220;Causing a Ruckus&#8221; from the &#8220;If All Else Fails&#8221; as it is no longer a policy I agree with. </li>
<li>Updated search engine statistics and links in the &#8220;If All Else Fails&#8221; page.</li>
<li>Added information about Zoho Creator to the &#8220;The Long Haul&#8221; page.</li>
<li>Edited for styling and added images to every page.</li>
</ol>
<p>Please let me know if you have any thoughts of suggestions for this section as these pages are easily among the most popular on the site and I want to make sure they are some of the best Plagiarism Today has to offer.</p>
<p>Sometime soon, I plan on doing the same to the <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/your-copyrights-online/">Your Copyrights Online</a> section but I have no definite timetable now. I&#8217;ll post more information as I come up with a more firm plan.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2010/04/07/updating-stopping-internet-plagiarism-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Plagiarism Detection Showdown: Bing vs. Yahoo! vs. Google</title>
		<link>http://www.plagiarismtoday.com/2009/06/03/plagiarism-showdown-bing-vs-yahoo-vs-google/</link>
		<comments>http://www.plagiarismtoday.com/2009/06/03/plagiarism-showdown-bing-vs-yahoo-vs-google/#comments</comments>
		<pubDate>Wed, 03 Jun 2009 19:03:04 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[bing]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[copyright infirngement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[plagiarism-detection]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=3660</guid>
		<description><![CDATA[Which of the search engines is the best for detecting plagiarism? Here's a 5-round deathmatch to find out!]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://files.plagiarismtoday.com/wp-content/uploads/2009/06/logocombo.jpg" alt="logocombo" title="logocombo" width="200" height="200" class="alignleft size-full wp-image-3677" /></p>
<p>With the launch of Microsoft&#8217;s new Bing search engine, there has been a lot of talk about which site produces the best results. Though Google is far and away the king in terms of popularity,  Yahoo! still holds onto sizeable minority and, with Microsoft&#8217;s new offering, there is a promise of a three-way horse race.</p>
<p>But for readers of this site, one of the critical questions when choosing a search engine is which does the best job of finding duplicated content? Whether one is looking to verify that a work is original or track down plagiarism of their own work, this is an important question.</p>
<p>So I decided to take the three search engines and put them head to head in what I am calling the &#8220;5 Round Search Engine Deathmatch&#8221;. The goal is to find which search engine performs the best and detecting plagiarism and, if possible, find the strengths and weaknesses of the three.</p>
<p>Without any further ado, here is how it went down.<span id="more-3660"></span></p>
<h4>The Test</h4>
<p>The goal of the test is not to be exhaustive, but to provide a quick overview of the various capabilities of the three search engines in this area. To do that, I chose five different works and ran unique phrases of those works through the search engines.</p>
<p>The first two works were poems of mine that had seen widespread reuse but had limited enough copying to be useful. The next two were two prose works, both with very limited reuse. The last was the Declaration of Independence, to see how many matches the search engines reported for that.</p>
<p>With each set of results, I went through by hand and counted the number of clear duplicate entries (matches where it was the same content, but at a different URL) and created a new total. The results of the five rounds are below.</p>
<h4>Round 1</h4>
<p>For this round <a href="http://www.ravensrants.com/in-the-dark/">a poem with widespread copying</a> was used.</p>
<table border="0" cellspacing=10>
<tbody>
<tr>
<td><strong>Search Engine</strong></td>
<td><strong>Initial Results</strong></td>
<td><strong>Duplicates</strong></td>
<td><strong>Final Total</strong></td>
</tr>
<tr>
<td><strong>Bing</strong></td>
<td>13</td>
<td>2</td>
<td>11</td>
</tr>
<tr>
<td><strong>Google</strong></td>
<td>47</td>
<td>17</td>
<td>30</td>
</tr>
<tr>
<td><strong>Yahoo!</strong></td>
<td>57</td>
<td>3</td>
<td>54</td>
</tr>
</tbody>
</table>
<p>The first test was a big win for Yahoo!. It not only found more results initially but also had fewer duplicates. Cast both a more broad and a more fine-tuned net. However, Google&#8217;s results were still respectable, catching most of the worst infringement despite having barely half of Yahoo!&#8217;s results.</p>
<p>Bing, on the other hand, brings up the rear. With barely a fifth of Yahoo!&#8217;s and a third of Google&#8217;s results, it didn&#8217;t perform up to any standard.</p>
<p><strong>Round Winner:</strong> Yahoo for both better detection and better duplicate filtering,</p>
<h4>Round 2</h4>
<p>For this round, <a href="http://www.ravensrants.com/teardrops/">another widely-copied poem</a> was used.</p>
<table border="0" cellspacing=10>
<tbody>
<tr>
<td><strong>Search Engine</strong></td>
<td><strong>Initial Results</strong></td>
<td><strong>Duplicates</strong></td>
<td><strong>Final Total</strong></td>
</tr>
<tr>
<td><strong>Bing</strong></td>
<td>6</td>
<td>2</td>
<td>4</td>
</tr>
<tr>
<td><strong>Google</strong></td>
<td>86</td>
<td>6</td>
<td>80</td>
</tr>
<tr>
<td><strong>Yahoo!</strong></td>
<td>43</td>
<td>3</td>
<td>40</td>
</tr>
</tbody>
</table>
<p>In this round Google turned the tables, doubling Yahoo!&#8217;s efforts. It found 80 non-duplicate results to Yahoo!&#8217;s 40. However, both had comparable duplicate filtering, both with approximately 7% duplicates. Not a bad percentage.</p>
<p>Bing, on the other hand, only found four results, just 5% of Google&#8217;s results and 10% of Yahoo!&#8217;s. It also had a much higher duplication problem with 33% of its results being duplicates. </p>
<p><strong>Round Winner:</strong> Google. Though and Yahoo! had identical duplication filtering, Google produced twice the results.</p>
<h4>Round 3</h4>
<p>For this round, <a href="http://www.ravensrants.com/loner/">a prose piece with limited reuse</a> was used.</p>
<table border="0" cellspacing=10>
<tbody>
<tr>
<td><strong>Search Engine</strong></td>
<td><strong>Initial Results</strong></td>
<td><strong>Duplicates</strong></td>
<td><strong>Final Total</strong></td>
</tr>
<tr>
<td><strong>Bing</strong></td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><strong>Google</strong></td>
<td>6</td>
<td>0</td>
<td>6</td>
</tr>
<tr>
<td><strong>Yahoo!</strong></td>
<td>4</td>
<td>1</td>
<td>3</td>
</tr>
</tbody>
</table>
<p>Google has another solid round, with 6 copies found and no duplicates. Yahoo!, on the other hand, only found four sites and one was a repeat. This is a clear case of Google finding more copies of a work, three to be specific.</p>
<p>Bing, on the other hand, found nothing. It didn&#8217;t even find the original site, which would have been good for one point. In short, Bing flat out failed in this test.</p>
<p><strong>Round Winner:</strong> Google for both finding more matches and providing more accurate results.</p>
<h4>Round 4</h4>
<p>For this round, <a href="http://www.ravensrants.com/soulripper/">a prose work with a modest amount of known copying</a> was used.</p>
<table border="0" cellspacing=10>
<tbody>
<tr>
<td><strong>Search Engine</strong></td>
<td><strong>Initial Results</strong></td>
<td><strong>Duplicates</strong></td>
<td><strong>Final Total</strong></td>
</tr>
<tr>
<td><strong>Bing</strong></td>
<td>2</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td><strong>Google</strong></td>
<td>27</td>
<td>22</td>
<td>5</td>
</tr>
<tr>
<td><strong>Yahoo!</strong></td>
<td>2</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>The results of this test were interesting. Google picked up the most results but its initial find of 27 results was complicated by an enromous duplication issue. A full 22 of the results were obvious duplicates, including over a dozen results pertaining to the same forum posting. This pushed the real infringements to the last page of the results.</p>
<p>Yahoo! and Bing both found two results, both on my site. One, in both cases, was a duplicate.</p>
<p><strong>Round Winner:</strong> No one. I thought long and hard about this one and have decided that no one deserves to win this round. Google found infringements but the results were so buried as to be useless. Yahoo! and Bing failed to find anything. </p>
<h4>Round 5</h4>
<p>For this round, a line from the opening of the <a href="http://www.ushistory.org/declaration/document/index.htm">Declaration of Independence</a> was used.</p>
<table border="0" cellspacing=10>
<tbody>
<tr>
<td><strong>Search Engine</strong></td>
<td><strong>Initial Results</strong></td>
<td><strong>Duplicates</strong></td>
<td><strong>Final Total</strong></td>
</tr>
<tr>
<td><strong>Bing</strong></td>
<td>4,800,000</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td><strong>Google</strong></td>
<td>53,700</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td><strong>Yahoo!</strong></td>
<td>118,000</td>
<p>171700</p>
<td>N/A</td>
<td>N/A</td>
</tr>
</tbody>
</table>
<p>For these results, due to their length, I am forced to take the search engine estimations at face value. However, I am having a very hard time believing Bing&#8217;s Results.</p>
<p><a href="http://files.plagiarismtoday.com/wp-content/uploads/2009/06/bing-bs.jpg"><img src="http://files.plagiarismtoday.com/wp-content/uploads/2009/06/bing-bs-300x72.jpg" alt="bing-bs" title="bing-bs" width="300" height="72" class="size-medium wp-image-3674" /></a></p>
<p>This would indicate that Bing, the out and out loser up until now, somehow found some 28x more results than Google and Yahoo! combined. This seems outrageous on multiple fronts and likely points to a flaw in Bing&#8217;s URL counting system (Note: Neither Yahoo! or Google have extremely reliable systems but I was able to hand check the results up to this point.)</p>
<p>As such, I am tossing Bing&#8217;s results for this round. Of the two that are within the realm of possibility, Yahoo has a definite edge with nearly double the results. Still, I don&#8217;t put too much stock in this test due to the unreliable nature of search engine self-reporting.</p>
<p><strong>Round Winner:</strong> Yahoo found more results and takes the test, though this really is not a definitive test, even less of one than the others.</p>
<h4>Final Results</h4>
<p>Of the four rounds that had winners, Yahoo! and Google split them 2-to-2. However, Yahoo!&#8217;s victory in the last round is both less compelling and of less use to most copyright holders than the other rounds so I have to declare Google, by a hair, the overall winner in this deathmatch.</p>
<p>But while this probably will not soothe the debate about Yahoo! vs. Google, it does indicate one thing very clearly, for the purpose of plagiarism detection, Bing is a wash. Not only did bing place or tie for last in every test (save the dubious results in the fifth one) it failed to detect any copies of the content in one.</p>
<p>To make matters even worse, Bing also makes it very difficult to filter our duplicate entries. Where Yahoo! and Google both attempt (somewhat unsuccessfully) to group likely duplicates together, Bing, it seems, does not. In many cases duplicate pages were spread across multiple pages rather than being indented under the original or clustered together.</p>
<p>Whether or not you make Bing your search engine of choice is your decision, but it probably should not be your search engine of choice for finding copies of your content.</p>
<h4>Bottom Line</h4>
<p><a href="http://www.plagiarismtoday.com/2009/05/07/plagium-a-copyscape-alternative/">I&#8217;ve said it before and I will say it again</a>. Don&#8217;t rely on any one search engine for your plagiarism detection. Both Yahoo! and Google found results the other missed. It is that simple. </p>
<p>However, if we&#8217;ve learned anything from this test it is that Bing, at least right now, is not ready to be relied on for plagiarism/copy detection. Whether one thinks its regular search results are solid or not, its phrase search results, the ones used to detect this kind of copying, are very weak.</p>
<p>This may be a bit unfair in that both Yahoo! and Google are established search players while Bing is just a preview release. However, since Live.com search results are already being forwarded on and <a href="http://searchengineland.com/internet-explorer-6-forces-bing-as-default-search-provider-20398">users of IE6 are having it forced on them as their default search engine</a>, it seems fair enough to put it to the test.</p>
<p>Right now, Bing is not up to this challenge though my hope is that, as they work on the search engine and improve it, that they may grow to become a viable third competitor. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2009/06/03/plagiarism-showdown-bing-vs-yahoo-vs-google/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Google Alerts to add RSS</title>
		<link>http://www.plagiarismtoday.com/2008/10/10/google-alerts-to-add-rss/</link>
		<comments>http://www.plagiarismtoday.com/2008/10/10/google-alerts-to-add-rss/#comments</comments>
		<pubDate>Fri, 10 Oct 2008 15:21:40 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[content detection]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[digital fingerprints]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Alerts]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Search-Engines]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1935</guid>
		<description><![CDATA[A recent article in the Wall Street Journal has given reason for many Google Alerts users to rejoice, the famous email alert service will soon be getting RSS support. ]]></description>
			<content:encoded><![CDATA[<p><IMG SRC="http://www.plagiarismtoday.com/images/google-alerts-20081010-100845.png" alt="Google Alerts Logo" align="left" class="picleft">A recent article in the Wall Street Journal by Walter Mossberg about <a href="http://online.wsj.com/article/SB122281243658792073.html">how to use alerts to keep track of the Web</a> dropped something of a bombshell for those of us who use <a href="http://www.google.com/alerts">Google Alerts</a> every day. According to Mossberg, Google Alerts will begin adding RSS alerts in addition to email ones &#8220;in about a month&#8221;.</p>
<p>Google Alerts, which is a service that sends out notices when content carrying the alert search term appears on the Web, currently only sends out its alerts via email. It is commonly used for vanity searches, for keeping on top of who mentions a person or site, and for keeping track of content, either through searches for <a href="http://www.plagiarismtoday.com/2005/11/07/tips-for-using-google-alerts/">statistically improbable phrases</a> or <a href="http://www.plagiarismtoday.com/2006/10/04/digital-fingerprints-to-detect-rss-scraping/">digital fingerprints</a>. </p>
<p>What this means to you will probably depend on how heavily you use RSS and how much use you make of Google Alerts. If you are not currently using Google Alerts and want to get started, I&#8217;ve <a href="http://www.plagiarismtoday.com/2008/01/24/video-how-to-use-google-alerts/">created a screencast to help you understand the basics</a>.</p>
<p>Obviously, I&#8217;ll have more to say on this once the new feature is made public. </p>
<p>However, at this time, I don&#8217;t see myself making heavy use of the RSS feature. I literally have years of experience meshing Google Alerts with email filters and creating a workflow around it. Though such a system could be moved to RSS easily, I don&#8217;t see how much is gained in my case.</p>
<p>Clearly though, this feature is not for people like myself and other current heavy users of Google Alerts, instead, it is for those who don&#8217;t because they can&#8217;t get the alerts in the format they want. This will change that and let them receive their alerts in a variety of places including their RSS reader, their Google home page and through a variety of mashup services.</p>
<p>Needless to say, this opens up a lot of new doors for Google Alerts but, personally, I&#8217;m just happy to hear that the service is still receiving some attention. After being so long without a significant upgrade, it is nice to see that Google is still working on their Google Alerts product. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/10/10/google-alerts-to-add-rss/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>An Inside Look at iCopyright Discovery</title>
		<link>http://www.plagiarismtoday.com/2008/09/30/inside-look-at-icopyright-discovery/</link>
		<comments>http://www.plagiarismtoday.com/2008/09/30/inside-look-at-icopyright-discovery/#comments</comments>
		<pubDate>Tue, 30 Sep 2008 17:22:15 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[content detection]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[creators]]></category>
		<category><![CDATA[discovery]]></category>
		<category><![CDATA[icopyright]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[tracking]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1824</guid>
		<description><![CDATA[The iCopyright Discovery system promises to revolutionize the way copyright holders track and protect their work. Now we get an inside look at what the system has to offer copyright holders. ]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/icopyright-logo1.png" alt="icopyright-logo.png" border="0" width="174" height="59" align="left" class="picleft" />Earlier this month, <a href="http://www.plagiarismtoday.com/2008/09/16/icopyright-announces-content-tracking-tool/">I reported on iCopyright&#8217;s new content tracking tool Discovery</a>. At that point, I only had the information provided in the press release for the service.</p>
<p>However, last week, Mike O&#8217;Donnell, the President and CEO of iCopyright, was kind enough to give me a guided tour of the backend. Though I wasn&#8217;t able to access anything hands on or experiment with the technology with my own content, that will have to wait until the service is available for <a href="http://creators.icopyright.com/">iCopyright for Creators</a> users, I was able to see what the service does, how it works and what it can do.</p>
<p>So here is a brief look at what the iCopyright Discovery system can do and how it will likely look when it is available for Creators users shortly. Please bear in mind that this is not a review, just a tour of the key features of the service. <span id="more-1824"></span><br />
<h4>The Basic Premise</h4>
<p>The big idea of Discovery is this: Discovery parses your content as you put it up on the Web, accessing either a created XML file or your RSS feed, and then searches for copies of it on the Web. </p>
<p>The service then searches for matches of your content, highlighting ones that it determines to be the most important, and gives you options for remedying the situation. Among the actions it can perform are removal requests, which fundamentally DMCA notices, license requests, which goes through iCopyright&#8217;s existing licensing system, and forwarding to legal counsel.</p>
<p>This idea is fundamentally very similar to <a href="http://attributor.com">Attributor</a> and <a href="http://www.blogwerx.com/">Blogwerx</a>, both of which are still in private testing. However, the execution of the system is going to be what is important. On that front, iCopyright has devised an interesting workflow system that seems to string the process together very well.</p>
<h4>Setting Up Discovery</h4>
<p>When a user first signs in to Discovery, the first page they&#8217;re likely going to head to is, oddly enough, the &#8220;Settings&#8221; page. The reason for this is that, without visiting the settings page, you have little control over the matches you see and you can&#8217;t use several of the remedy options. </p>
<p><a href="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/settings.jpg"><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/settings-300x220.jpg" alt="" title="settings" width="300" height="220" class="alignleft size-medium wp-image-1830" /></a></p>
<p>From this page, you can set your enforcement agency, useful if you are part of a group that handles your copyright enforcement, and the email address to your legal counsel. This will let you enable addition redress steps down the road. However, the most important settings are the search sensitivity and risk assessment as they determine the matches you see down the road.</p>
<p>The search sensitivity feature allows users to tell Discovery how many matches they want. They can set it so that only the worst matches appear in the system or so that they see almost everything. This is done by tweaking the minimum match ratio, meaning how much of the original work must appear in the copy, the minimum risk factor, discussed below, the minimum site activity and the minimum number of copied words that must appear in the match, useful for sites with short posts.</p>
<p>The Risk Assessment tool is easily one of the most interesting features in iCopyright Discovery. It lets users set the criteria for determining how much of a risk a match site is. You do that by setting sliders for Unique Visitors, which looks at the estimated traffic of the site, the number of inbound links, whether the site displays ads or how much of the content it copies.</p>
<p>These sliders are intended to be abstract in nature and are used to indicate which attributes are more important than others. For example, if you set all to 10, they would be weighed equally. However, if you put one at 5 and the others at 10, the first one would be weighed much less. </p>
<p>These attributes, when combined with the site&#8217;s actual use of the content, are used to determine the risk level of the site itself. This, in turn, plays a major role in determining the priority the site is given when analyzing suspect pages. </p>
<h4>Sorting Matches</h4>
<p>Once you are done telling Discovery what matches you want to see, the system then does a refresh, which takes about an hour according to O&#8217;Donnell, and you can then view your matches or &#8220;suspects&#8221;.</p>
<p><a href="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/suspect_list.jpg"><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/suspect_list-300x213.jpg" alt="" title="suspect_list" width="300" height="213" class="alignleft size-medium wp-image-1831" /></a></p>
<p>The match sort is organized by a combination of variables, focusing heavily on suspect pages with the highest risk. For each suspect, the system displays the URL of the work, whether it displays ads, whether it links back to your site, roughly how many visitors it gets, the number of inbound links to the site, the match percentage and the risk.</p>
<p>From this page, you can go through the matches and either archive the match, which functions similar to Gmail&#8217;s archive function and takes no action, move it to the Whitelist, either pending or approved, or send it to the redress list.</p>
<p>If a site is moved to the whitelist, that means that the use is licensed and future matches from the site will be ignored. You have the option of telling the system to either ignore matches on the URL, the subdomain or the entire domain.</p>
<p>If you move it to the redress list, you can then take further action on the match, including licensing the work or filing a removal demand.</p>
<h4>Taking Action</h4>
<p>The redress list, as you see below, looks very similar to the suspect list and contains much of the same information. However, the options for what one can do with a suspect are different on this page.</p>
<p><a href="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/redress_list.jpg"><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/redress_list-300x205.jpg" alt="" title="redress_list" width="300" height="205" class="alignleft size-medium wp-image-1829" /></a></p>
<p>From this page, you can then either offer the site a license, which will send out an email encouraging the site admin to go through the existing iCopyright system, file a link request or send a removal notice.</p>
<p>Removal notices, fundamentally, are DMCA notices though they are written so that, at this stage, they can be sent to Webmasters directly. Link requests are more like informal license offers, but ones where the only stipulation is a link back.</p>
<p>All of the letter types are fully customizable and Discover offers a templating system that lets you build your own letter that automatically inserts the necessary information.</p>
<p>Once you file a redress, you can then track the status of it in the Redress Offers Status page. From there, it will let you know if the redress has been completed and, if it hasn&#8217;t, makes it available to be escalated. </p>
<p>If a suspect match is moved to the escalation list, then the user has a whole new series of options for how to deal with the site. </p>
<p><a href="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/escalation_list.jpg"><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/escalation_list-300x196.jpg" alt="" title="escalation_list" width="300" height="196" class="alignleft size-medium wp-image-1828" /></a></p>
<p>The options include the ability to, forward the situation to your legal counsel (if set up), notify the ISP, which sends a more traditional DMCA notice, notify the enforcement agency (if set up), send a notice to the ad network or demand removal from the search engines. </p>
<p>All in all, the initial Redress List can be looked at as the cease and desist/licensing phase where the Escalation List deals more with the DMCA/lawyer phase. </p>
<p>However, no matter what redress steps you take, Discovery offers a powerful means to track and monitor the progress of the steps that you took. </p>
<h4>Tracking and Monitoring</h4>
<p>Once you&#8217;ve taken a redress action against a suspect site, you can then track and monitor everything that has to do with that particular match. </p>
<p><a href="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/action_audit_trail.jpg"><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/09/action_audit_trail-300x219.jpg" alt="" title="action_audit_trail" width="300" height="219" class="alignleft size-medium wp-image-1826" /></a></p>
<p>It provides much more than just a brief history of what has taken place, giving a detailed history of every email sent, comments left in the system, both automatic ones and ones left by the user, as well as other information about the site.</p>
<p>The idea is to maintain a record of every action, including emails, phone calls and other steps, for the purpose of aiding in any potential legal case. </p>
<p>Once the matter is resolved, escalated outside of the system or the match is whitelisted, the case can be archived and thus removed from the suspect pool, allowing you to move on to other matches.</p>
<h4>Some personal thoughts</h4>
<p>It is very hard for me to offer any real review of the service. Without actually being hands on with the service and using it against my own content, there is not much that I can do.</p>
<p>Right now there are many unknowns for me, including the following: </p>
<ol>
<li><strong>Match Detection:</strong> O&#8217;Donnell has said they are partnering with a major search provider to perform the detection but it remains to be seen how effective it is. Match detection is not easy, even with a big search partner, <a href="http://www.plagiarismtoday.com/2007/10/02/copyscape-improved-again/">as Copyscape showed</a>. The system will not be of much use if its match detection is not the best in its class.</li>
<li><strong>Resolution Assistance:</strong> The hardest part about stopping a plagiarist is not composing the letter, but finding who to send it to. It is easily the biggest time sink in most of my cases and is the number one reason people approach me for help. It remains to be seen how effectively Discovery helps with this process.</li>
<li><strong>Speed/Usability:</strong> Obviously, without actually using the system, I can&#8217;t tell how fast it moves and how much time it will save you. If the system is sluggish or error-prone, it could greatly hurt its usefulness.</li>
</ol>
<p>This is not to say that these things are wrong with the current system, just that I don&#8217;t know right now and won&#8217;t until I can do a full review, likely later this year.</p>
<p>However, judging from what I can see, the system is very impressive. It looks very good, has a solid workflow built into it, though I somewhat disagree with having the ISP step be only available in the escalation section, and seems to be built with the user in mind.</p>
<p>What I like best about Discovery is how the user customizes the system to fit their needs, with their own definitions of what matches to worry about, their own letters and their own general strategy. Any such system should focus on automating what can be automated, but leaving the big decisions to the copyright holder.</p>
<p>What does worry me some is that the system is clearly geared toward larger clients. Discovery is designed to allow for multiple users to access an account and to work with attorneys as well as other rights enforcers. While those are great features for those that need them, it remains to be seen how the system will strip down for smaller copyright holders.</p>
<p>The other downside is that, according to O&#8217;Donnell, the version of Discovery for Creators will come with some kind of fee. Though pricing structure has not been discussed, he seemed confident that it would not be available for free.</p>
<p>Still, as these screenshots show, there is a lot to like in the Discovery system and the solution it promises.</p>
<p>It has a great deal of potential and Webmasters who are worried about tracking how their content is used should definitely take a serious look at what iCopyright has to offer.</p>
<h4>Conclusions</h4>
<p>There&#8217;s a lot of reason for me to be excited about the upcoming Discovery system. However, I have to restrain that excitement until I can use the system first hand and see both how effective it is and how smooth the process is.</p>
<p>No matter what though, I am happy to see that people are thinking about these issues and coming up with solutions. This has been a booming industry over the past few years and a lot of very smart companies are already involved and I am happy to be working in this field.</p>
<p>No matter what Discovery itself brings, it can only signal great things for copyright holders and Webmasters. Hopefully, this will help content creators not just enforce their rights, but understand how their work is being reused and encourage the kind of sharing that helps all involved.</p>
<p>Knowledge and tools can only help improve things, so long as those who use them do so wisely.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/09/30/inside-look-at-icopyright-discovery/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Popularity of Plagiarism</title>
		<link>http://www.plagiarismtoday.com/2008/07/02/the-popularity-of-plagiarism/</link>
		<comments>http://www.plagiarismtoday.com/2008/07/02/the-popularity-of-plagiarism/#comments</comments>
		<pubDate>Wed, 02 Jul 2008 15:44:15 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google trends]]></category>
		<category><![CDATA[MPAA]]></category>
		<category><![CDATA[plagiarim]]></category>
		<category><![CDATA[RIAA]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1290</guid>
		<description><![CDATA[Inspired by recent posts, I decided to take a look at Google Trends and see how search terms relative to content theft were doing. ]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-110241.png" alt="Google Trends Logo" align="left" class="picleft"/>A pair of recent articles, <a href="http://www.louisgray.com/live/2008/06/on-web-if-youre-not-growing-youre-dying.html" title="If You're Not Growing You're Dying">one by Louis Gray</a> and <a href="http://codingexperiments.com/archives/149" title="">another by possible248</a> (who co-authors the blog along with, among others, Voyagerfan5761, are regular here) showcased public interest in relavent search terms, namely company names and Linux distributions respectively, using <a href="http://trends.google.com/trends?hl=en" title="Google Trends">Google Trends</a>.</p>
<p>This, in turn, inspired me to do my own keyword analysis to gauge if and how public interest in topics relevant to this site have changed over the years. </p>
<p>What I found was surprising and seemed to run counter to what I was seeing with my own traffic but was interesting nonetheless.<br />
<span id="more-1290"></span></p>
<h4>Plagiarism</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-105214.png" alt="Google Trends for Plagiarism"></p>
<p>Perhaps the most obvious keyword and definitely the most common one that leads visitors to this site, this keyword has <a href="http://trends.google.com/trends?q=plagiarism&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Google Trends Plagiarism">seen surprisingly little change over the past few years</a>. </p>
<p>Over all, the graph for it is flat with a few &#8220;ticks&#8221; upward when news stories, such as the Obama controversy and the Kaavya Viswanathan scandal, broke. There are also season downward ticks at the end of every year, likely due to the holidays.</p>
<p>In general, it appears that the overall interest in plagiarism, both academically and artistically, has remained consistent and unchanged.</p>
<h4>Content Theft</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/content-theft-google-trends-20080702-103956.png" alt="Google Trends for Content Theft"></p>
<p>Probably the most unusual graph, <a href="http://trends.google.com/trends?q=content+theft&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Content Theft on Google Trends">content theft as a search term</a> spiked in mid-2005, around the time this site was founded, and then leveled off, only to become a regular search term again in recent months.</p>
<p>It is unclear to me what has caused these specific spikes but the latest one seems to be holding and showing some sustainable interest in the topic. Something that could indicate greater public interest in the issue and in the term itself.</p>
<h4>Copyright</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-105332.png" alt="Google Trends for Copyright"></p>
<p>Copyright, on the other hand, <a href="http://trends.google.com/trends?q=Copyright&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Google Trends Copyright">has seen a marked decrease over the past few years</a>, at least as a search term.</p>
<p>While this seems counter-intuitive, considering that stories about copyright, especially as it pertains to the RIAA/MPAA, seem to dominate social news sites, please are clearly not search for copyright information as much as they used to.</p>
<p>This is reflected even more strongly in the <a href="http://trends.google.com/trends?q=RIAA&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Google Trends RIAA">related graph for the RIAA</a> and <a href="http://trends.google.com/trends?q=DMCA&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0">the DMCA</a>, where the downward slope is even more pronounced and, in the case of the RIAA, seems to almost disappear completely.</p>
<p>Though it doesn&#8217;t appear that people have lost interest in copyright issues, it is clear that they are not searching for them as much as they once were.</p>
<h4>Duplicate Content</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-105447.png" alt="Google Trends for Duplicate Content"></p>
<p>One of the greater concerns people have about plagiarism is the issue of duplicate content. As we can see on the graph above, the term <a href="http://trends.google.com/trends?q=duplicate+content&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all" title="Google Trends Duplicate Content">rocketed onto the chart in early 2007</a>, stabilized and seems to be slowly marching upward. </p>
<p>Duplicate content, of course, covers more than just plagiarism and scraping, but a wide variety of SEO concerns. However, it is clear that this is a topic being talked about more and more. It is unclear in what capacity this term is being searched for. </p>
<h4>Plagiarism Detection Tools</h4>
<p><img src="http://www.plagiarismtoday.comwp-content/uploads/2008/07/skitched-20080702-100727.png" alt="Google Trends for Duplicate Content"></p>
<p>Looking at the chart for <a href="http://www.copyscape.com">Copyscape</a> (shown above) shows a steady increase in the number of searches over the past year and a half. This seems to mesh with my own experience, which has shown a great increase in content protection over the past 18 months. </p>
<p>Other Plagiarism detection tools, such as <a href="http://www.bitscan.com">Bitscan</a> and <a href="http://www.attributor.com">Attributor</a>, did not have enough information for Google Trends to draw any conclusions. Academic plagiarism detection tools, such as Turnitin, <a href="http://trends.google.com/trends?q=Turnitin&#038;ctab=0&#038;hl=en&#038;geo=all&#038;date=all&#038;sort=0" title="Turnitin on Google Trends">have shown a steady increase with seasonal dips as school lets out</a>. </p>
<h4>Long Tail Keywords</h4>
<p>Unfortunately, a lot of the keywords most specific to this site such as &#8220;spam blogs&#8221;, &#8220;splogs&#8221;, &#8220;RSS scraping&#8221;, etc. did not have enough data to produce results. Many of these terms are fairly new, created since I started Plagiarism Today, and are not widely used. </p>
<p>It will be interesting to see in a year or two if these keywords start to register then.</p>
<h4>Caveats</h4>
<p>In doing this &#8220;study&#8221; I realize that Google Trends is both limited and a largely invalid source of data. Not only is the data proprietary, meaning it can not be vetted, but the information is relative and contains little hard data. </p>
<p>Also, many of the keywords looked at are not keywords that are searched for by typical searchers and instead would only be searched for by bloggers. Others, however, were likely searched by both. This means that we may not have an accurate picture of how just content creators feel about these issues.</p>
<p>The goal of this check was just to get a quick idea of what was going on and what the potential attitudes were.</p>
<h4>Conclusions</h4>
<p>When I personally look at these charts, I draw three conclusions.</p>
<p>First, I see that there is a sharp decrease in the interest of searchers in the legal aspects of copyright. This could be due to greater understanding about copyright, and thus less need to search about it, or just that that users have just moved on from the early copyright controversies of the late nineties.</p>
<p>Second, there is a clear, if slow, increase in interest in tracking one&#8217;s own content and the non-legal penalties that come from infringing or being infringed. This could be a sign that creators are not thinking about these issues in the light of a legal paradigm, but rather, in a more practical framework.</p>
<p>Finally, it is clear that the interest in plagiarism, both academically and artistically, remains fairly steady and that it remains an issue of interest even after the scandals fade from the headlines.</p>
<p>Personally, this site has seen an explosive growth over the past year, both doubling in traffic and enabling me to leave my day job to work full-time as a consultant. Clearly, things are changing in this area. </p>
<p>I look forward to following these changes closely over the coming years.</p>
<p><strong>Note:</strong> All of the graphs in this post are <a href="http://www.google.com/intl/en/trends/about.html#18" title="Google Trends Terms of Use">used with permission from Google</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/07/02/the-popularity-of-plagiarism/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding the Age of a Page</title>
		<link>http://www.plagiarismtoday.com/2008/06/06/finding-the-age-of-a-page/</link>
		<comments>http://www.plagiarismtoday.com/2008/06/06/finding-the-age-of-a-page/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 15:52:16 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google blog search]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1254</guid>
		<description><![CDATA[If you need a quick and easy way to get an idea of when a post went life, there is a Firefox plugin that uses google to put that information just a click away.]]></description>
			<content:encoded><![CDATA[<p><IMG SRC="http://www.plagiarismtoday.com/images/linkdiagnosis-logo-20080606-104242.png" alt="Link Diagnosis Logo" align="left" class="picleft">One of the more difficult challenges on the Web is determining when a page was created. We simply can not trust the date and time stamps provided with the content we read as both good guys and bad guys alike <a href="http://www.plagiarismtoday.com/2008/05/27/spam-bloggers-who-backdate/" title="Spam Bloggers who Backdate">change the date of their posts as necessary</a>.</p>
<p>Search engines, however, can provide a much better set of statistics than a site&#8217;s own timestamps. The only issue is that gleaning the needed information can be difficult. Fortunately, a relatively new Firefox plugin entitled <a href="http://www.linkdiagnosis.com" title="Link Diagnosis">Link Diagnosis</a> helps with that by taking the dirty work out of determining when a page was indexed by Google.</p>
<p>The tool, while not perfect, can be a valuable asset when trying to determine approximately when a page appeared on the Web.<br />
<span id="more-1254"></span></p>
<h4>How it Works</h4>
<p><IMG SRC="http://www.plagiarismtoday.com/images/get-page-age-20080606-104402.png" alt="Get Page Age Screenshot"align="right" class="picright">Link Diagnosis is actually a robust plugin designed to analyze incoming links to a URL for SEO purposes. However, as one of its &#8220;hidden features&#8221; it is able to deteremine, approximately, <a href="http://blog.linkdiagnosis.com/?p=19" title="http://blog.linkdiagnosis.com/?p=19">the day the URL appeared in Google</a>.</p>
<p>It works simply by having the user right click the page they want to check, select the &#8220;Get Page Age&#8221; option and, after a few seconds they are greeted with a JavaScript popup containing the date the script detected the site appeared.</p>
<p>It works by using <a href="http://www.googletutor.com/2006/08/22/more-google-hacking-using-the-inurl-operator/" title="Google INURL">Google&#8217;s INURL command</a> which, when used in conjunction with a date filter, causes Google to display a date by each resulting URL. What the plugin does is take the URL you wish to check, create the search query and then automatically extract the applicable date, thus turning a multi-step process into a one-click solutions.</p>
<p>For anyone seeking to find out the date of a site, this could prove to be both a powerful tool and a good time saver as well.</p>
<h4>Why to Use It</h4>
<p>There are many reasons why you might want to check out the age of a particular page. </p>
<p>For one, you can use it to check if a spam blog or a plagiarist was indexed by Google before or after your original post (provided it was indexed at all). This can help determine what action you should take against the site. </p>
<p>However, many will also find its non-repudiation services to be very useful. If there ever is a dispute about who posted an article or an image first, this tool can help resolve it by providing an independent view on which went up first.</p>
<p>Though certainly not as accurate as <a href="http://www.numly.com">Numly</a> or <a href="http://www.myfreecopyright.com">MyFreeCopyright</a>, using Google is far more accurate than looking at the <a href="http://www.archive.org">Web Archive</a>, especially considering that the latter can take over six months to display any information about a URL.</p>
<p>Still, Link Diagnosis is still far from perfect in this area. there are many issues one will have if one tries to rely upon this for non-repudiation.</p>
<h4>Limitations</h4>
<p><IMG SRC="http://www.plagiarismtoday.com/images/page-age-capture-20080606-104544.png" alt="Get Page Age Error" align="left" class="picleft">Before you begin to make heavy use of this service bear in mind the following caveats:</p>
<p><OL><LI><strong>Google&#8217;s Limitations:</strong> The biggest issue of using the INURL method is that Google is not always index a site or a page immediately after it goes up. There are often delays. Also, the service can only work with pages already in the Google database, anything that has been blacklisted, either by the creator or by Google, will return no results.</LI><br />
<LI><strong>URLs and Not Content:</strong> The function will tell you when the URL appeared in Google, not the content on the page. For permalinks that may be acceptable but dynamic pages, such as the front page of Plagiarism Today, it can create a problem.</LI><br />
<LI><strong>Different Owners:</strong> Also, the system detects when a URL was first indexed by Google, not who owned it at the time. If a site changes ownership, even if it is taken out of Google during the transition, the date shown for the home page will be long to the original owner. </LI></OL></p>
<p>In short, the tools is subject to the exact same gaming and manipulation that Google and the other search engines are. As such, it can provide some quick and dirty information, especially on permalinks, but should never be taken as the ultimate gospel on the age of a page.</p>
<p>Link Diagnosis is no substitute for a true non-repudiation service and it does not claim to be.</p>
<h4>Conclusions</h4>
<p>Personally, I find the other features of Link Diagnosis much more compelling than its &#8220;page age&#8221; feature. Though it is great for a quick analysis, especially of a spam blog permalink, it may not always tell the complete truth or have the information you are seeking.</p>
<p>It is a great analysis tool but it should not be assumed to be the plain truth. There are plenty of ways that it could be wrong.</p>
<p>So, as with every tool, be sure to use it in conjunction with common sense and logic. Have it available, use it if needed, but don&#8217;t use it as a replacement for your own judgment.</p>
<p>No tool is that powerful.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/06/06/finding-the-age-of-a-page/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Spam Bloggers Who Backdate</title>
		<link>http://www.plagiarismtoday.com/2008/05/27/spam-bloggers-who-backdate/</link>
		<comments>http://www.plagiarismtoday.com/2008/05/27/spam-bloggers-who-backdate/#comments</comments>
		<pubDate>Tue, 27 May 2008 15:07:04 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[Blogger]]></category>
		<category><![CDATA[Blogspot]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[pagerank]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogs]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1087</guid>
		<description><![CDATA[Through a combination of trickery and error, it is often possible for a spam blog to appear to have posted your works before you did. However, what effect does this have on the search engines? The answer is "Not Much".]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/05/blogger-timestamp-unsized.jpg" alt="" title="blogger-timestamp-unsized" width="224" height="61" class="picleft alignleft size-medium wp-image-1089" />A few weeks back, a reader of this site noticed a spam blogger not only scraping his posts, but backdating the entries before re-posting them. The resulting site made it appear as if all of the scraped entries had appeared well before the original ones, possibly tricking both search engines and human readers.</p>
<p>However, in this case, the backdating was unlikely to fool anyone. The date shifting was so severe, usually spanning several weeks, that many of the entries on the spam blog were listed as posted before the events they described and, most likely, were allegedly posted on dates well before the search engine spiders made their last visits.</p>
<p>Still, it is not uncommon to see spam bloggers backdate their scraped posts more conservatively. From a shift of a few hours to account for time zone differences to a day or two to try and appear more legitimate, there are many reasons why a spammer&#8217;s post may appear to go up before your own.</p>
<p>Fortunately though, this is not a major worry for Webmasters. The timestamps we look at are all lies and both search engines and users know that to be the case.</p>
<p><span id="more-1087"></span><br />
<h4>Why Timestamps Lie</h4>
<p>The problem with the timestamps provided by most major blogging platforms is that they are easily changed by users. There are many legitimate reasons why a blogger or Webmaster would want to alter a timestamp. You can forward date a post so that it publishes in your absence, pre-date the post so that it fits into a natural series with related items or set the date to an outlandish time so that it remains at the top of the page.</p>
<p>Even if there is no intentional manipulation of the timestamp by the author, it can still be wrong due to problems with the server, disagreements in time zone and other completely natural issues that can change the date a post or page is listed as going up.</p>
<p>For these reasons, search engines place very little faith in the timestamp of a post when determining which is the original. As such, spammers are unable to simply backdate their scraped posts and claimed the top spot in Google.</p>
<p>Fortunately, it is a bit more difficult that that.</p>
<h4>It&#8217;s About Trust</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/05/pt-pagerank-300x102.jpg" alt="" title="pt-pagerank" width="300" height="102" class="picright alignright size-medium wp-image-1090" />If spammers could steal search engine thunder by simply backdating their posts, every spammer would be doing it. However, search engines place much more stock in how much trust the sites involve have and that is something much more difficult to obtain.</p>
<p>This is something that Andy Beard points out on his site. <a href="http://andybeard.eu/2008/05/why-you-should-nofollow-your-blog-comments.html">In a recent post on his blog</a>, he responded to a <a href="http://www.davidnaylor.co.uk/why-you-should-nofollow-your-blog-comments.html">previous post by David Naylor</a>, using many of the same keywords. Though Beard&#8217;s post both came later and linked to Naylor&#8217;s post in the first paragraph, Beard&#8217;s site was able to claim the top spot in Google for a relative search term due solely to its search engine authority.</p>
<p>Though the story is anecdotal in nature, it illustrates how Google, and other search engines, award rankings. It is not based merely upon who is first, but rather, who is it trusts more and which site the search engines feels the reader would rather land on.</p>
<p>This makes backdating posts an ineffective tactic for gaining search engine ranking. If Google does not trust your site, it does not matter if your post appears to have come first or even if it truly did, you will not rank well for terms related to it.</p>
<p>While this is good news for many bloggers who are heavily scraped, there are other bloggers that have a great deal to worry about.</p>
<h4>Spammer Trust</h4>
<p>On the upside, if your site is well-established and is generally trusted by the search engines, it has a natural shield against scraping. Search engines are not likely to give a new site more authority than you on a topic, regardless of how they date their posts.</p>
<p>However, statistically speaking, most active blogs are fairly new and have not yet earned that level of authority. As such, they may be very vulnerable to scraping, especially considering that spam bloggers often leverage their networks to build up artificial authority. In the early months of a blog&#8217;s life, it is entirely likely that the spammers scraping its posts may have more authority and trust than the original posts, making it very hard for the site to find its footing.</p>
<p>In short, the problem with authority is that all sites start out with none and that makes them vulnerable to abuse from sites that have any, no matter how little.</p>
<h4>Conclusions</h4>
<p>Bloggers have very little to worry about from &#8220;clever&#8221; spammers that backdate their posts. The search engines place little to no faith in those timestamps and, most likely human readers don&#8217;t either.</p>
<p>The issue is not who came first, but who carries more trust. The Associated Press, for example, will always carry more trust than a one-month old blog and the fact that the blog backdates its posts is irrelevant. </p>
<p>In short, it is more important to cull and nurture this kind of relationship than it is to simply be first. This is not just a large part of what prevents spam bloggers from simply taking over the Web, but also part of the reason why new bloggers often struggle with scraping so much more severely than established.</p>
<p>It is very important to track and stop blog scraping, especially in the early months of a blog&#8217;s life, to further that trust and ensure that the spammers can not build an artificial reputation. </p>
<p>After all, the sword cuts both ways. If being first will not help the spammers, it will not help you either. Building and maintaining your authority level is the first and best step to protecting yourself against scraping, but it is one that requires both hard work on building your content and vigilance at keeping the spammers at bay.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/05/27/spam-bloggers-who-backdate/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 14:31:10 -->
