<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism Todayspam-blog | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/spam-blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 06:51:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Cpedia: A Spam Blog Disguised as an Encyclopedia</title>
		<link>http://www.plagiarismtoday.com/2010/06/09/cpedia-a-spam-blog-disguised-as-an-encyclopedia/</link>
		<comments>http://www.plagiarismtoday.com/2010/06/09/cpedia-a-spam-blog-disguised-as-an-encyclopedia/#comments</comments>
		<pubDate>Wed, 09 Jun 2010 16:21:04 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[copyright infirngement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[cpedia]]></category>
		<category><![CDATA[cuil]]></category>
		<category><![CDATA[encyclopedia]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[spam-blog]]></category>
		<category><![CDATA[splog]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=6818</guid>
		<description><![CDATA[CPedia is a new automatically-generated encyclopedia from the makers of Cuil. So why is it throwing thousands of pages of duplicate content into Google?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/06/cpedia-logo.jpg" alt="" title="cpedia-logo" width="255" height="88" class="alignleft size-full wp-image-6825"></p>
<p>Last week, <a href="http://twitter.com/melebeth">@melebeth</a> introduced me to <a rel="nofollow" href="http://cpedia.com">CPedia</a>, a new &#8220;encyclopedia&#8221; by the makers of <a href="http://cuil.com">Cuil</a>, a search engine that was initially greeted with much fanfare before <a href="http://techcrunch.com/2008/12/27/cuil-fail-traffic-nearly-hits-rock-bottom/">seemingly flaming out</a>. </p>
<p>Cpedia is not an encyclopedia in the strictest sense as it is not written by human beings. Unlike traditional encyclopedias, which are written by paid experts, or Wikipedia, which is written largely by volunteers, Cpedia is written automatically from the search result pages creating an automated, and <a href="http://gigaom.com/2010/04/16/cpedia-founder-errors/">often wildly inaccurate</a> encyclopedia-like page. </p>
<p>For example, <a rel="nofollow" href="http://www.cpedia.com/wiki/Jonathan_Bailey_of_Plagiarism_Today_(all_pages)">I have a Cpedia, page</a> as well as <a rel="nofollow" href="http://www.cpedia.com/wiki?q=Plagiarism+Today">one for this site</a>, though my personal page doesn&#8217;t actually say anything about me and the one for PT seems to discuss random people/items only tangentially related to the site.</p>
<p>However, the concern Melebeth approached me with was not just about the accuracy of Cpedia, but about the way it used content from other sources. According to her, the search engine was lifting text directly from third-party sites but not properly quoting or citing it.</p>
<p>So, I delved into Cpedia and found, unfortunately, that her fears were largely founded.<span id="more-6818"></span></p>
<h4>How CPedia Works</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/06/pt-cpedia-sample-217x300.jpg" alt="" title="pt-cpedia-sample" width="217" height="300" class="alignright size-medium wp-image-6827"></p>
<p>The basic idea behind CPedia is that it combs through the search results for a relevant term and tries to build out an encyclopedia entry automatically. The results, visually, are very similar to Wikipedia but the content is generally more jumbled and difficult to read. </p>
<p>Cpedia does attribute the content it uses, but in a very strange way. If you click or hover your mouse over the text within an article, but not the link, you will be given a sidebar that shows the text that&#8217;s been quoted and a link to the source in the sidebar. If you click the inline text link, you are instead taken to a references page that then links to the original source. </p>
<p>Usually, the individual copied passages are very short though, sometimes, the word count of the passage approached 100 words, especially when the source package was broken up into multiple parts.</p>
<p>CPedia seems to have a nearly unlimited number of topics covered, likely aided by the fact it is automatically generating results, and has many pages that Wikipedia does not, including one for me.</p>
<p>All in all, CPedia is fairly straightforward but that does not mean it isn&#8217;t a problem. In fact, in this case, it means quite the opposite.</p>
<h4>Problems With Cpedia</h4>
<p>Apart from the questionable accuracy of Cpedia, the entire operation, to me, seems highly suspect. The idea of creating new pages of content using snippets from dozens, even hundreds of other pages seems to be a very poor way to do business. </p>
<p>But even discarding the way the content is created, there are several issues with the attribution issue alone. Consider the following two problems:</p>
<ol>
<li><strong>Always One Step From Source Link:</strong> Whether you hover over the text or click to the references page, you are always one action away from the source link. This means users and other search engines alike are always two steps from the source site even though it would be trivial to make it one.</li>
<li><strong>Lack of Clear Quotes:</strong> The entire entry is made up of short verbatim quotes from various sources but it is not clear where the quotes begin and end without hovering over the text. The goal is to make the entire work seem like an original creation, an actual encyclopedia entry, without much in the way of visible quotes, just traditional footnote citations.</li>
</ol>
<p>However, the bigger problem is actually very simple. There are already many sites that build thousands and thousands of entries using snippets from various other pages. They&#8217;re called spam blogs and they use a variety of article generation and spinning technology to build new articles out of hodgepodges of existing ones.</p>
<p>And Cpedia is acting very much like a spam blog. Entries from CPedia are appearing in Google, <a href="http://www.google.com/search?hl=en&#038;safe=off&#038;client=safari&#038;rls=en&#038;q=site%3Acpedia.com&#038;aq=f&#038;aqi=&#038;aql=&#038;oq=&#038;gs_rfai=">which currently has about 177,000 entries indexed</a>, and though, <a rel="nofollow" href="http://cpedia.com/robots.txt">Cpedia&#8217;s robots.txt</a> disallows the wiki directory, it doesn&#8217;t seem to be stopping search engines from indexing the entries.</p>
<p>When you factor all of this together, it becomes clear that Cpedia is acting exactly like a spam blog and less like an encyclopedia. Was the intention? Probably not. But it is how the site is functioning, pumping thousands of pages of poorly-written duplicate content into the major search engines.</p>
<p>If that is not the hallmark of a spam blog, I&#8217;m not sure what is.</p>
<h4>Making it Stop</h4>
<p>To be clear, what Cpedia is doing isn&#8217;t, most likely, illegal. Fair use would likely protect their very limited use of the content from each individual source. This is one of the reasons this technique is so common among spam blogs is that it makes them almost immune to copyright disputes as a means of closure.</p>
<p>In short, even though the ethics of Cpedia can be hotly debated, most likely they are on the right side of the law.</p>
<p>That being said, if you want your work removed from Cpedia, all you have to do is remove it from Cuil and that can be done by <a rel="nofollow" href="http://www.cpedia.com/info/webmaster_info/">using robots.txt to block &#8220;twiceler&#8221;</a>.</p>
<p>Also, you can block the IP range that Cuil uses for crawling, which is also listed on the link above.</p>
<p>It is a fairly simple change to make and one that is relatively easy to make. (Note: I have not and will not make it on PT, I keep my robots.txt open intentionally to help observe various issues, like these).</p>
<p>All in all, though I disagree strongly with what Cpedia is doing, they do have the right to do it. This makes fighting back trickier, but far from impossible.</p>
<h4>Bottom Line</h4>
<p>What Cpedia is doing, in my opinion, is unethical. They are using quotes from various sites without adequate clarity or attribution. They are pumping thousands of pages of admittedly duplicate content into other search engines and are producing and encyclopedia that, by their own admission, is wildly inaccurate. </p>
<p>Though copyright may not be a viable litigation route, I have to wonder how libel will work in this case as repeating libel is, generally, <a href="http://www.dancingwithlawyers.com/freeinfo/libel-slander-mis-information.shtml">the same as making the libelous statement</a>. In short, those admitted inaccuracies in Cpedia could, in theory, come back to bite the company at a later date.</p>
<p>Considering that search engine liability in cases of libel is still being settled around the world, <a href="http://newsinfo.inquirer.net/breakingnews/infotech/view/20100425-266422/Google-fined-for-pedophile-libel-against-priest">Google won such a claim in the UK</a> but republishing this information on your own site and admitting it is inaccurate seems to be opening up new avenues for liability.</p>
<p>Would this be a likely claim against Cuil/Cpedia? Probably not. But only because the audience for the site is so small that it seems unlikely many will care. The fact that Cuil/Cpedia has seen so little success is a big part of why webmasters haven&#8217;t noticed the spammy nature of the issue and taken up arms.</p>
<p>To be certain, Cpedia flew under my radar until Melebeth asked me about it. I can imagine it is doing the same for many others right now as well.  </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2010/06/09/cpedia-a-spam-blog-disguised-as-an-encyclopedia/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Copyright 2.0 Show &#8211; Episode 149</title>
		<link>http://www.plagiarismtoday.com/2010/04/30/copyright-2-0-show-episode-150/</link>
		<comments>http://www.plagiarismtoday.com/2010/04/30/copyright-2-0-show-episode-150/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 18:39:25 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Podcast]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[global grind]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS scraping]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[spam-blog]]></category>
		<category><![CDATA[Splogging]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=6522</guid>
		<description><![CDATA[It is Friday again and that means that it is time for another episode of the Copyright 2.0 Show. It is a very special week for the Copyright 2.0 Show as spend the hour on just one news story, the Global Grind controversy originally reported on by Patrick O&#8217;Keefe, the esteemed co-host of the show....]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/04/globalgrind-logo.jpg" alt="" title="globalgrind-logo" width="209" height="68" class="alignleft size-full wp-image-6471"></p>
<p>It is Friday again and that means that it is time for another episode of the Copyright 2.0 Show.</p>
<p>It is a very special week for the Copyright 2.0 Show as spend the hour on just one news story, <a href="http://www.plagiarismtoday.com/2010/04/27/global-grind-copies-content-publishes-it-to-google-news/">the Global Grind controversy</a> originally <a href="http://www.patrickokeefe.com/2010/04/26/global-grind-copies-content-submits-it-to-google-news/">reported on by Patrick O&#8217;Keefe</a>, the esteemed co-host of the show. We also debuted a new chatroom, <a href="http://www.plagiarismtoday.com/podcast">which can be found here</a> and had a very long, involved discussion with those who dropped by for the show. </p>
<p>It was a great show and we hope to see you there every Wednesday at 6 PM ET for the live recording!</p>
<p>In this show we covered:</p>
<ul id="null">
<li>The Background of the Global Grind Case</li>
<li>What Has Been Done About It</li>
<li>How Global Grind Has Responded</li>
<li>What Affected Webmasters Can Do</li>
<li>What&#8217;s Next for the Case</li>
<li>And Many more&#8230;</li>
</ul>
<p>You can <a href="http://recordings.talkshoe.com/TC-22590/TS-352308.mp3">download the MP3 file here</a> (direct download). Those interested in subscribing to the show can do so via <a href="http://www.copyright20.com/podcasts/rss">this feed</a>.</p>
<p><a href="http://www.diigo.com/list/plagiarismtoday/episode-149">Show Notes</a></p>
<h4>About the Hosts</h4>
<p><strong>Jonathan Bailey</strong></p>
<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://files.plagiarismtoday.com/wp-content/uploads/2009/06/jonathan-box-150x150.png" alt="jonathan-box" title="jonathan-box" width="150" height="150" class="alignleft size-thumbnail wp-image-3842"></p>
<p>Jonathan Bailey (<a href="http://twitter.com/plagiarismtoday">@plagiarismtoday</a>) is the Webmaster and author of Plagiarism Today (Hint: You&#8217;re there now) and works as a copyright and plagiarism consultant. Though not an attorney, he has resolved over 700 cases of plagiarism involving his own work and has helped countless others protect their work and develop strategies for making their content work as hard as possible toward their goals.</p>
<p><strong>Patrick O&#8217;Keefe</strong></p>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://files.plagiarismtoday.com/wp-content/uploads/2009/06/patrick.jpg" alt="patrick" title="patrick" width="150" height="150" class="alignright size-full wp-image-3848"></p>
<p>Patrick O&#8217;Keefe (<a href="http://twitter.com/iFroggy">@iFroggy</a>) is the owner of the <a href="http://www.ifroggy.com">iFroggy Network</a>, a network of websites covering various interests. He&#8217;s the author of the book <a href="http://www.managingonlineforums.com/">&#8220;Managing Online Forums,&#8221;</a> a practical guide to managing online communities and social spaces. He maintains a blog about online community management at <a href="http://www.managingcommunities.com/">ManagingCommunities.com</a> and a personal blog at <a href="http://www.patrickokeefe.com/">patrickokeefe.com</a>.</p>
<p><object type="application/x-shockwave-flash" width="220" height="160" data="http://bigcontact.com/feed-player/8912_16725/r:0;t:1001"><param name="quality" value="best"><param name="wmode" value="window"><param name="allowScriptAccess" value="always"><param name="allowFullScreen" value="true"><param name="movie" value="http://bigcontact.com/feed-player/8912_16725/r:0;t:1001"></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2010/04/30/copyright-2-0-show-episode-150/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://recordings.talkshoe.com/TC-22590/TS-352308.mp3" length="64267493" type="audio/mpeg" />
		</item>
		<item>
		<title>Fav.Or.It Site Shuts Down</title>
		<link>http://www.plagiarismtoday.com/2009/08/07/fav-or-it-site-shuts-down/</link>
		<comments>http://www.plagiarismtoday.com/2009/08/07/fav-or-it-site-shuts-down/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 17:29:55 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[aggregation]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[favorit]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[spam-blog]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[tweetmeme]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=4299</guid>
		<description><![CDATA[Famous content aggregator Fav.or.it is closed, much to the relief of at least some in the blogging community.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://files.plagiarismtoday.com/wp-content/uploads/2009/08/favorit-logo-300x68.png" alt="favorit-logo" title="favorit-logo" width="300" height="68" class="alignleft size-medium wp-image-4301" /></p>
<p><strong>Article Updated:</strong> See Below. </p>
<p><a href="http://www.crunchbase.com/company/favorit">Fav.or.it</a>, a content aggregation service, had earned a great deal of controversy among many bloggers. The site would collect content from various RSS feeds, at least in some cases including the full content, and display it on their site as well as offer visitors the chance to comment and discuss the news, away from the original site. This caused some to accuse Fav.or.it of using <a href="http://www.inquisitr.com/1116/when-did-splogging-become-a-business-model-favorit/">splogging as a business model</a> and earned it several mentions on this site, <a href="http://www.plagiarismtoday.com/2009/03/03/excerpts-scraping-and-fair-use/">including this one</a>. </p>
<p>However, earlier this week, Fav.or.it went down. The initial message said that the site had been taken down for &#8220;maintenance&#8221;. However, after a few days, the message was changed to read the following:</p>
<blockquote><p>&#8220;Recently we decided that we would not continue with the fav.or.it service on our site. For more details about this please take a look on our blog. As a result we are replacing the site with our company site!&#8221;</p></blockquote>
<p>In short, the Fav.or.it service is no more. It is down never to return, and the site is being replaced by a home page for the company, also called Fav.or.it, which also runs products such as <a href="http://tweetmeme.com/">Tweetmeme</a>, which I was unaware of until the site changed, and <a href="http://">TweetTabs</a> (Note: The link to TweetTabs on the Fav.or.it site currently is broken).</p>
<p>The sudden closure of the site seemed odd and I emailed Fav.or.it to ask what had happened. However, I am yet to receive a response.</p>
<p>(Note: I have a suspicion as to why the site might have gone down but do not wish to say anything more until I get confirmation.)</p>
<h4>Fav.or.it Moves On</h4>
<p>Though the closure of its flagship service may seem like a major blow, it could be a very good thing for the company in the long run. They have already moved on to other, more-popular and less copyright-questionable products the best-known of which is Tweetmeme, which is used on this site.</p>
<p>This may help Fav.or.it focus their time and resources on those projects, rather than a small, legally-dubious and <a href="http://siteanalytics.compete.com/fav.or.it+tweetmeme.com/">much less popular content aggregator</a>.</p>
<p>In the end, I don&#8217;t harbor any real ill feelings toward the company as they seem to have created new products that manage to be both useful and raise far fewer copyright issues. Abandoning Fav.or.it as a service was a good move for both themselves and Webmasters everywhere.</p>
<p>As such, I have no qualms about keeping the Tweetmeme feature on this site and don&#8217;t think others should either.</p>
<p>I just hope that this serves as a learning experience for Fav.or.it as a company and they are able to build and grow from this controversy.</p>
<h4>Update: 08/10/09</h4>
<p>Fav.or.it, the company, has posted a <a href="http://blog.fav.or.it/2009/08/favorit-is-dead-long-live-favorit/">short blog entry about the closure of the site</a> that talks about the various reasons for its closure including a shift in the commenting marketplace that made it difficult for them to compete, a lack of updates to the service and a series of implementation mistakes that caused it to lag behind other competitors, including Lazyfeed.</p>
<p>The post did address the content reuse issues and said the following:</p>
<blockquote><p>The site has also not been without controversy for re-use of content (through public RSS feeds), and although we put massive effort into support of licensing models (such as auto-detection of creative commons) our approach to aggregation of content for which we could not detect a license, and that required the publisher to opt-out (rather than opt-in) was in hindsight misguided.</p></blockquote>
<p>In short, Fav.or.it, the company, acknowledges they made mistakes with the way they used content within the system and seem to have learned from them with Tweetmeme. However, they did stop short of issuing a full apology. Though I don&#8217;t think that will satisfy the most disgusted at Fav.or.it, there still seemsto be much rejoicing that the site is done for, both due to its poor reuse of content, but that it also increases the focus on Tweetmeme. . </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2009/08/07/fav-or-it-site-shuts-down/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Copyright 2.0 Show &#8211; Episode 19 &#8211; McTakeDown</title>
		<link>http://www.plagiarismtoday.com/2007/08/13/copyright-20-show-episode-19-mctakedown/</link>
		<comments>http://www.plagiarismtoday.com/2007/08/13/copyright-20-show-episode-19-mctakedown/#comments</comments>
		<pubDate>Mon, 13 Aug 2007 15:25:11 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Podcast]]></category>
		<category><![CDATA[Bay-TSP]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[linux-veoh-perfect10]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[sco]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[spam-blog]]></category>
		<category><![CDATA[splog]]></category>
		<category><![CDATA[viacom]]></category>
		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/08/13/copyright-20-show-episode-19-mctakedown/</guid>
		<description><![CDATA[It&#8217;s Monday again and that means it is time for another 40-minute episode of the Copyright 2.0 show. This week the show is filled to the brim with the usual copyright news, humor and sarcasm that has made the show so special. Also included is a special birthday announcement and my pathetic attempt to rewrite...]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s Monday again and that means it is time for another 40-minute episode of the Copyright 2.0 show. This week the show is filled to the brim with the usual copyright news, humor and sarcasm that has made the show so special. Also included is a special birthday announcement and my pathetic attempt to rewrite my own history.</p>
<p>All in all, it was a busy week in copyright news, a total of sixteen stories were covered including the following:</p>
<ul id="null">
<li>SCO lost much of its Linux copyright infringement suit</li>
<li>Eight more dogpile onto YouTube</li>
<li>Veoh launches a preemptive strike</li>
<li>Perfect 10 is at it again</li>
<li>Google mistakes its own blog as spam, deletes it</li>
<li>And Many more&#8230;</li>
</ul>
<p>You can <a href="http://go.numly.com/1847107081310184589">download the MP3 file here</a>. Those interested in subscribing to the show can do so via <a href="http://www.copyright20.com/podcasts/rss">this feed</a>.</p>
<p><a href="http://del.icio.us/copyright20/19">Show Notes</a></p>
<p>I also want to take a moment to link to <a href="http://arstechnica.com/articles/culture/plagiarism-and-falsified-data-slip-into-the-scientific-literature.ars">this story on Ars Technica</a> dealing with plagiarism issues in the scientific community. It is a great read and I wanted to cover it on the broadcast but there simply wasn&#8217;t any time. </p>
<p>[audio:http://go.numly.com/1847107081310184589]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/08/13/copyright-20-show-episode-19-mctakedown/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
<enclosure url="http://go.numly.com/1847107081310184589" length="8055849" type="audio/mpeg" />
		</item>
		<item>
		<title>Is Blogger on the Offensive Against Spam?</title>
		<link>http://www.plagiarismtoday.com/2007/06/26/is-blogger-on-the-offensive-against-spam/</link>
		<comments>http://www.plagiarismtoday.com/2007/06/26/is-blogger-on-the-offensive-against-spam/#comments</comments>
		<pubDate>Tue, 26 Jun 2007 15:30:11 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Blogger]]></category>
		<category><![CDATA[Blogspot]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[spam-blog]]></category>
		<category><![CDATA[splog]]></category>
		<category><![CDATA[Sploggers]]></category>
		<category><![CDATA[Splogs]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/06/26/is-blogger-on-the-offensive-against-spam/</guid>
		<description><![CDATA[Updated Information Here As part of running this site, I subscribe to many different Technorati Watchlists. They help me keep up to date on the latest in content-theft and plagiarism-related issues. Unfortunately, I see a great deal of spam blogs on these watchlists. What&#8217;s worse, it can be hard to tell, when looking at my...]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.plagiarismtoday.com/2007/06/28/update-google-responds-regarding-blogspot-spam/">Updated Information Here</a></p>
<p>As part of running this site, I subscribe to many different <a href="http://www.technorati.com/watchlist/">Technorati Watchlists</a>. They help me keep up to date on the latest in content-theft and plagiarism-related issues. </p>
<p>Unfortunately, I see a great deal of spam blogs on these watchlists. What&#8217;s worse, it can be hard to tell, when looking at my RSS reader, which blogs are legitimate and which are junk. Thus, I often end up clicking through to the splogs that successfully penetrate Technorati&#8217;s armor.</p>
<p>Most of those spam blogs have, traditionally, been on Blogspot. However, over the past week or so, I&#8217;ve noticed that a lot of the Blogspot links have been returning results like this indicating that the blog has been locked down for &#8220;Possible Blogger terms of service violations&#8221;.</p>
<p>It appears that, at least based upon the sample I have, that Blogger is on a major offensive against spam blogs and that their effectiveness has gone up drastically over the past week or so. If true, this could be great news for bloggers, especially those on Google&#8217;s service, but more research is needed before a victory can be claimed.</p>
<p>I am looking into this matter and am trying to find out exactly what is going on. It could just be that Google has discovered the network responsible for most of the spam targeting my keywords and all of this is a fluke. However, I wanted to pose the question to everyone reading this: Have you noticed a reduction in spam from Blogspot?</p>
<p>I&#8217;ll be interested to hear if others are having similar experiences. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/06/26/is-blogger-on-the-offensive-against-spam/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Using Creative Commons to Stop Scraping</title>
		<link>http://www.plagiarismtoday.com/2007/06/05/using-creative-commons-to-stop-scraping/</link>
		<comments>http://www.plagiarismtoday.com/2007/06/05/using-creative-commons-to-stop-scraping/#comments</comments>
		<pubDate>Tue, 05 Jun 2007 17:50:35 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[cc]]></category>
		<category><![CDATA[cc-licenses]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Creative-Commons]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[spam-blog]]></category>
		<category><![CDATA[splog]]></category>
		<category><![CDATA[Splogging]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/06/05/using-creative-commons-to-stop-scraping/</guid>
		<description><![CDATA[Many sites, including this one, have expressed concerns that CC licenses may be encouraging or enabling scraping. The problem seems to be straightforward. If a blog licenses all of their content under a CC license, then a scraper that follows the terms of said license is just as protected as a human copying one or...]]></description>
			<content:encoded><![CDATA[<p>Many sites, <a href="http://www.plagiarismtoday.com/2005/12/13/creative-commons-license-to-splog/">including this one</a>, have <a href="http://www.blogmaverick.com/2005/12/10/attack-of-the-splogs-revisited/">expressed concerns</a> that CC licenses may be encouraging or <a href="http://openswitch.org/journal/copyright-and-the-blogger">enabling scraping</a>. </p>
<p>The problem seems to be straightforward. If a blog licenses all of their content under a CC license, then a scraper that follows the terms of said license is just as protected as a human copying one or two works. This may be within the letter of the license, but it violates the spirit of Creative Commons.</p>
<p>However, after talking with <a href="http://creativecommons.org/about/people#21">Mike Linksvayer</a>, the Vice President of Creative Commons, I&#8217;m relieved to say that is not the case. CC licenses have several built-in mechanisms that can prevent such abuse.</p>
<p>In fact, when one looks at the future of RSS, it is quite possible that using a CC license might provide better protection than using no license at all. </p>
<p><span id="more-509"></span><strong>Against the Spirit: A Crisis with the Commons?</strong></p>
<p>Whether or not some scrapers target CC licensed material or not is up for debate, what is clear is that, when they do, it is often a source of frustration. </p>
<p>People choose CC licenses because they want to share their work with others. They want to participate in a cultural revolution and give their ideas new wings. They do not, generally, want to see their entire site mirrored elsewhere, surrounded by Adsense ads and depriving them of traffic.</p>
<p>Ideally, a CC license is supposed to be symbiotic. The licensor gives up certain rights to their work and the licensee, in exchange for use of the work, makes certain the original author gets due credit and is rewarded for his or her effort. Spam bloggers, however, approach the CC license in bad faith, taking as much as they can while giving as little as possible back.</p>
<p>This has prompted many CC license users to either drop or alter their license. It has become common for sites that are being scraped to <a href="http://www.micropersuasion.com/2005/12/blog_content_th.html">change their licenses to &#8220;non-commercial&#8221;</a>, stop using CC licenses or even shut down their sites altogether.</p>
<p>However, though these spam blogs do seem to be following the terms and conditions of the Creative Commons Licenses, even if by accident, the vast majority do not. In fact, even enabling commercial use of your work is not an open invitation to be scraped.</p>
<p>As it turns out, CC licenses have built in mechanisms that can be used to fight that kind of abuse.</p>
<p><strong>Where Computers Fear to Tread</strong></p>
<p>For the use of a CC licensed work to be valid, according to Linksvayer, the following terms must be met among others:</p>
<ol>
<li>The work must be attributed and it must provide a link back to the copyright holder.
</li>
<li>If the license is non-commercial, then the work must be used accordingly.
</li>
<li>If a license has a share-alike term attached to it, then the copied work must express the same license.</li>
<li>All CC licensed material must state that it is licensed as such, usually with a statement that says &#8220;This work is licensed under a Creative Commons License&#8221;. Failure to do so puts the reuse in violation. </li>
<li>Finally, with Creative Commons, the licensor has the right to request removal of their name from any reused content, failure to comply puts the reuse in violation of the license.</li>
</ol>
<p>The problem with all of this is that it is almost impossible for an automated scraping system to comply with all of these elements. </p>
<p>Though some spam bloggers do attribute and link back, most do not. All spam blogging, at least in theory, is a violation of the non-commercial license and, since no spam blogs I have seen carry over CC information, they are in violation of the attribution and share-alike attributes of the Creative Commons License.</p>
<p>Even if a scraper manages to comply with the first four mechanisms above, it is unlikely that, when asked, they would remove the name from any work they reused. Spammers, seeking to automate their operations, are unlikely to edit their spam blogs by hand to appease one copyright holder.</p>
<p>The result is that virtually all automated scraping and spam blogging is a violation of the Creative Commons License, regardless of what license is used.</p>
<p><strong>Technicalities and Human Error</strong></p>
<p>Some of these attributes, however, are relatively unknown. Though most people understand what is and is not acceptable with the various CC licenses, many of the nuances of using CC licenses, such as the fourth mechanism, are little known or followed, even by humans seeking to play fair.</p>
<p>However, most copyright holders, often in the dark about the requirements themselves, do not hold human copiers to these standards. So long as they get the attribution and reuse that they envision, they typically do not raise any alarms if there isn&#8217;t a &#8220;This work is licensed under&#8230;&#8221; statement in the reused content.</p>
<p>The question is whether or not it is fair to hold scrapers to a higher standard than we generally hold other people. While it certainly is the right of the copyright holder to determine which misuses of their work they follow up on and deal with, many will, likely, feel uneasy about using largely unenforced technicalities against spam bloggers.</p>
<p>But even those who feel uneasy about enforcing those elements of the Creative Commons license may still benefit from applying one, especially to their feed. With some very difficult questions about copyright and RSS feeds unanswered, having a defined license on your feed could be come critical.</p>
<p><strong>Implied Licenses and RSS</strong></p>
<p>Though most attorneys I know and have spoken with feel that <a href="http://www.plagiarismtoday.com/2007/01/29/twil-discusses-implied-licenses-on-rss-feeds/">there is no implied license to scrape and republish RSS feeds</a>, the question has not yet come before a court and the outcome, as with all cases pushing new territory, is unpredictable at best.</p>
<p>However, if it is determined that RSS scraping and republishing is legal and that there is such an implied license with posting an RSS feed, attorney Denise Howell feels that any implied license can be <a href="http://betweenlawyers.corante.com/archives/2006/01/21/rss_and_copyright_the_no_example.php">overwritten by a defined one</a>, such as a Creative Commons License. </p>
<p>This makes sense consider that an implied license is one <a href="http://www.bitlaw.com/copyright/license.html#implied">designed to operate when there is no actual agreement</a> exists between the parties. If a specific license is posted, it would override the implied license.</p>
<p>We see this already on the Web. By posting a Web page to the Internet, the courts have found that there is an implied license for it to be indexed and cached by the search engines. However, once you state your intention for the page to not be used in such a manner, either through meta tags, robots.txt or manual opt-out, the implied license is dropped and the search engines, legitimate ones at least, have to comply with your requests.</p>
<p>The Creative Commons Organization is working on means of <a href="http://wiki.creativecommons.org/Syndication">integrating CC licenses into RSS feeds</a>. Hopefully this issue will garter more attention as the legal issues mount and a more final draft can be fleshed out.</p>
<p><strong>Conclusions</strong></p>
<p>The bottom line is that Creative Commons does not encourage or permit blind RSS scraping and spam blogging. Though it might be useful for legitimate aggregation, Creative Commons provides a great deal of protection against scraping, much more than previously thought.</p>
<p>Whether or not these mechanisms prove useful in fighting scraping remains to be seen. However, there is no longer a reason to hold back on a DMCA notice or a copyright complaint just because your commercial CC license seems to permit the use. Unless the scraper followed all of the requirements above, the use is still invalid.</p>
<p>Hopefully this will encourage the wider use of CC licenses, specifically the use of more liberal ones. I myself have removed the non-commercial requirement from my CC license as, like many others, my primary concern was commercial use by scrapers.</p>
<p>In the end, this is just another example of how Creative Commons, when used correctly, can work well for everyone and, in many cases, is good copyright policy. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/06/05/using-creative-commons-to-stop-scraping/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 12:06:48 -->
