<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism Todaytechnorati | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/technorati/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 17:55:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Workfriendly: Yet Another Issue</title>
		<link>http://www.plagiarismtoday.com/2008/04/08/workfriendly-yet-another-issue/</link>
		<comments>http://www.plagiarismtoday.com/2008/04/08/workfriendly-yet-another-issue/#comments</comments>
		<pubDate>Tue, 08 Apr 2008 14:56:34 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[errors]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google blog search]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[technorati]]></category>
		<category><![CDATA[workfriendly]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=887</guid>
		<description><![CDATA[Workfriendly, a script that masks the Web to look like an open Microsoft Word document, may have been created as a joke, but it continues to create serious problems for the Webmasters that it scrapes. ]]></description>
			<content:encoded><![CDATA[<p><img class="picleft" style="float: left;" src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlylogo.png" alt="WorkFriendly Logo" width="185" height="36" />Back in November of last year, I wrote an article about <a title="WorkFriendly" rel="nofollow" href="http://www.workfriendly.net">Workfriendly</a>, calling it an &#8220;<a title="WorkFriendly as an Accidental Scraper" href="http://www.plagiarismtoday.com/2007/11/09/workfriendly/">accidental scraper</a>&#8221; and accusing the site of allowing search engines to index pages containing scraped content.</p>
<p>The site, which is simply a script that <a href="http://www.diylife.com/2008/03/17/surf-the-web-without-your-boss-knowing/">modifies other sites</a> to look like a document in Microsoft Word, so that one can surf the Web at work without raising suspicion, has <a title="Google Results for WorkFriendly" href="http://www.google.com/search?q=site%3Aworkfriendly.net&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a">nearly a quarter of a million URLs referenced in Google</a>, even though only one page, the home page, contains original content.</p>
<p>However, I recently discovered that Workfriendly has another issue with it, one that causes, in some cases, both users and the search engines to seek out nonexistant URLs, causing 404 errors in very large numbers.</p>
<p>Though it is a problem caused by Workfriendly, it is one that Webmasters and bloggers need to take action to correct if they are vulnerable. Otherwise, the search engines could be steered toward hundreds of non-working URLs on your site, potentially hurting your ranking in them.<br />
<span id="more-887"></span></p>
<h4>Discovering the Problem</h4>
<p><img class="picright" src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlysucks21.jpg" border="0" alt="workfriendlysucks2.jpg" width="267" height="275" align="right" />I discovered the problem with Workfriendly over the weekend by accident. I logged into my Google Webmaster Tools account to check on any errors I had and was stunned to find over 150 file not found errors.</p>
<p>WordPress typically does a pretty good job avoiding file not found errors so to discover so many on my site, especially with no other errors found, was surprising.</p>
<p>Thinking that, perhaps, my recent update had caused an issue with my permalinks, I looked at the errors themselves. One was caused by me changing the date on a post, another was a server error where the URL worked fine, but the other 149 pointed to a directory that does not and has never existed on this server &#8220;/browse/Office2003Blue/&#8221;.</p>
<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlysucks3-2.png" border="0" alt="workfriendlysucks3_2.png" width="550" height="202" /></p>
<p>I remembered that Workfriendly used a similar link structure when you browsed the Web through it. I hopped onto the site and pulled up Plagiarism Today and watched as Workfriendly pulled up the site successfully. Clearly, the ban I had put in place a few months ago had stopped working, likely due to the plugin I was using not being compatible with newer version of WordPress.</p>
<p><img class="picleft" src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlysuck7.jpg" border="0" alt="workfriendlysuck7.jpg" width="368" height="205" align="left" />After pulling up Plagiarism Today in Workfriedly, I hovered my mouse over one of the links and looked at the URL, indeed, it was pointing to URLs on this server in the non-existant &#8220;browse&#8221; directory. Clicking the link resulted in chaos in Workfriendly and, in most cases, led to the site loading up without Workfriendly&#8217;s obfuscation.</p>
<p>I immediately set out to block Workfriendly, this time using a hand-coded <a title="How to Block Scrapers with .htaccess" href="http://www.plagiarismtoday.com/2007/07/02/using-htaccess-to-stop-content-theft/">.htaccess block</a>, but not before trying to figure out what was causing the problem.</p>
<h4>Understanding the Issue</h4>
<p>What made the problem perplexing was that it seemed to only be this site that was having the issue. Other sites I tested with Workfriendly worked fine.</p>
<p>However, after I looked at the source code for the page that Workfriendly created, the problem became almost immediately clear.</p>
<p>Plagiarism Today uses a &#8220;base&#8221; meta tag. It is a tag used to tell search engines and Web browsers what the &#8220;base&#8221; URL of your site is so that, when you use relative links (links that do not begin with an &#8220;http://&#8221;), the browser knows what URL you are pointing to.</p>
<p>It is a good practice for SEO reasons and to help with <a title="Preventing 302 Hijacking" href="http://www.plagiarismtoday.com/2007/06/14/302-hijacking-an-old-danger-made-new-again/">preventing 302 hijacking</a>. Still, most sites do not have one and, in many cases, it isn&#8217;t necessary.</p>
<p>The problem was that Workfriendly, despite having manipulated all of the links on my site, was using relative links for everything. Rather than saying &#8220;http://www.workfriendly.net/browse/&#8230;&#8221; the links simply said &#8220;/browse/&#8230;&#8221;.</p>
<p>When it was combined with the base tag by the browser, that converted all of the links to &#8220;http://www.plagiarismtoday.com/browse/&#8230;&#8221;, a link that does not exist.</p>
<p>The combination of the base tag and Workfriendly&#8217;s use of relative links was causing the site to throw back URLs that did not exist and, due to the poor use of robots.txt, causing the search engines to pick up those bad links as well.</p>
<h4>An Inconsiderate Script</h4>
<p>My issue with Workfriendly has never been the service itself. Though some could argue that it creates a derivative work of the sites it processes, since the works are never saved, but are rather created dynamically, it is a difficult case to make.</p>
<p>However, more to the point, I am not upset about sites that want to remix or alter the site to make it easier to read. I would not oppose a version better suited for the visually impaired, for mobile browsers or other formats as needed, so long as the site showed basic respect for the content it was displaying.</p>
<p>And that is the problem with Workfriendly. The service shows no consideration for the Webmasters whose content it uses.</p>
<p>For one, the site allows the search engines to index the scraped pages, even though the pages do not exist and are, instead, dynamically-generated.</p>
<p>Second, sloppy programming on the site causes it to generate artificial 404 errors that could hurt Webmasters when dealing with the search engines. Fortunately though, since the bad links are on an external site, they likely won&#8217;t have much impact.</p>
<p>However, if Workfriendly had simply used a correct link format, including the &#8220;http://www.workfriendly.net&#8221; before each link or stripping out the Base tag, the issue would not be a problem at all.</p>
<p>But what is perhaps strangest of all is that Workfriendly offers you a script that you can put on your site to direct your visitors to their version of your site. However, in addition to letting your visitors use the Workfriendly service, you may be helping the search engines find your content in their links.</p>
<p>It seems unlikely that is worth the trade off.</p>
<h4>Conclusions</h4>
<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlysucks5-1.jpg" border="0" alt="workfriendlysucks5-1.jpg" width="250" height="163" align="right" />Personally, I decided it was time to be done with Workfriendly. I edited my .htaccess file and have banned the server from accessing this site. So far it is the only IP to be completely banned from this domain. If you attempt to access the site from Workfriendly, you will get the message displayed to the right.</p>
<p>If anyone is looking for the code I added to my .htaccess file, I simply put this before any of my WordPress code:</p>
<blockquote><p>order allow,deny<br />
deny from 66.226.27.21<br />
allow from all</p></blockquote>
<p>This certainly isn&#8217;t the type of steps I wanted to take, but it was I felt I was forced to do and, sadly, what I have to encourage others to look at doing to.</p>
<p>But the problem is that, in their bid to create something simple and fun, the creators of Workfriendly made something that poses a real danger to Webmasters and bloggers. Though simple changes to the system could remedy these problems easily, the authors have either neglected or refused to do so.</p>
<p>The result, on this site at least, is that Workfriendly is banned. I have attempted to contact the creators several times in the past but have never received a response. Considering all of the attention that has been paid to scraping issue, it seems that either the creators are ignoring the criticism, or have abandoned the project.</p>
<p>Either way, right now Workfriendly is just another problem for Webmasters and bloggers to worry about.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/04/08/workfriendly-yet-another-issue/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Why Blog Searching Fails</title>
		<link>http://www.plagiarismtoday.com/2008/03/07/why-blog-searching-fails/</link>
		<comments>http://www.plagiarismtoday.com/2008/03/07/why-blog-searching-fails/#comments</comments>
		<pubDate>Fri, 07 Mar 2008 17:51:48 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Punditry]]></category>
		<category><![CDATA[blog search]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google blog search]]></category>
		<category><![CDATA[icerocket]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Search-Engines]]></category>
		<category><![CDATA[searching]]></category>
		<category><![CDATA[technorati]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2008/03/07/why-blog-searching-fails/</guid>
		<description><![CDATA[Blog searching has become an important tool both for gathering the information I use on this site, but also for staying on top of content theft issues. Unfortunately, blog searching has deteriorated to the point of uselessness and I break down why. ]]></description>
			<content:encoded><![CDATA[<p><img SRC="http://aycu40.webshots.com/image/45679/2002442769408806476_rs.jpg" align="left" class="picleft"/>Blog searching has been something of the holy grail for the new Web. Countless companies and search engines have worked to tap the endless stream of information that is the blogging world to deliver useful results in near-real time.</p>
<p>Bloggers, in turn, have used these tools to keep on top of the various topics, check for misuse of their content, seek out related posts and look for other sites referencing them.  </p>
<p>Many even use the tools without realizing. WordPress users, for example, use blog search every time they load their dashboard and see their &#8220;incoming links&#8221;. Others use it when they embed <a href="http://technorati.com/help/tags.html">Technorati Tags</a> into their posts. </p>
<p>However, blog searching is not working and it is getting worse. Where it once returned respectable results, today it throws out far more noise than signal. This has made it almost impossible to use the blog search for almost any practical use, making the only effective use of the technology to search for content theft and, even then, only in certain situations.</p>
<p>Unless something is done to fix blog search, it is only a matter of time before the technology is left completely by the wayside.<br />
<span id="more-839"></span></p>
<h4>My History</h4>
<p><img SRC="http://aycu12.webshots.com/image/48171/2002420190824064829_rs.jpg" align="right" class="picright"/>When I first started Plagiarism Today nearly three years ago, I started using blog search extensively. I subscribed to a series of related <a href="http://www.technorati.com">Technorati</a> watchlists and used my RSS reader to keep track of the latest happenings in the field of copyright and content theft. As time went on, I added RSS feeds form other search engines including <a href="http://www.icerocket.com">Icerocket</a> and <a href="http://www.google.com/blogsearch">Google Blog Search</a>. </p>
<p>Initially the system worked pretty well. Though there were some garbage results, for the most part every item in my RSS reader warranted opening in a browser. However, over time, the amount of noise grew at a much faster rate than the number legitimate posts, eventually outpacing it three or four times over.</p>
<p>Now, I feel as if I am inundated by these watchlists, getting dozens of results per hour, but only a fraction of which are actually original, human-written articles.</p>
<p>As a result, I spend as much time per day filtering out the junk from my feeds as I do responding to and bookmarking legitimate articles. </p>
<p>The system has become the model of inefficiency and is borderline useless. However, looking through my feeds, I think I know where much of the noise is coming from and why this problem has crept up on us.</p>
<h4>Issues With Blog Search</h4>
<p>When performing an autopsy on my blog search results recently, I noticed that there were six types of posts that were cluttering up my feeds and suffocating the original content.</p>
<ol>
<li><strong>Spam Blogs:</strong> Splogs are an easy target to blame for blog search clutter but they are not the worst players, truth be told. Since most simply parrot posts already available elsewhere in the feed, they are easy to ignore. Still, the nature of spam blogs pretty much ensures that every post I want to read is repeated two or three more times over the coming hours and days.
</li>
<li><strong>Non-Blogs:</strong> When blog searching started, pretty much only blogs had RSS feeds. Today, RSS feeds are included in forums, twitter pages, social networking profiles and more. Sadly, all of these things are routinely showing up in blog search results. While often relevant in terms of keywords, they are difficult to comment on and rarely provide any news.</li>
<li><strong>Comment Feeds:</strong> It seems that none of the blog search engines have been good at recognizing the difference between the main feed and the comment feed. I routinely see &#8220;Comment On&#8221; posts in my RSS reader. Typically they are posts I&#8217;ve already posted a comment to or the post is very old and is simply receiving a trickle of comments or trackbacks.</li>
<li><strong>Old Posts:</strong> It is a strange but increasingly common issue where the results that get returned are anything but timely. I regularly get posts marked as &#8220;new&#8221; that are really several weeks old. This morning, for example, one of my blog search feeds had two posts talking about the candidated &#8220;fighting for Texas&#8221; and reporting the <a href="http://www.plagiarismtoday.com/2008/02/20/the-obama-plagiarism-scandal/">allegations of plagiarism against Oboma</a> as new. The posts were dated in late February. </li>
<li><strong>Repeated Posts:</strong> I have RSS feeds from three of the major blog search engines so I fully expect to see the same post a few times on the different feeds. However, I regularly see the same post repeated on the same feed, sometimes even days or weeks later. Many times I click on a post, thinking it is new, and am stunned that I commented on it weeks ago and a quick check shows that it showed up on the same feed when it was brand new.</li>
<li><strong>Foreign Language Posts:</strong> Finally, even though my search terms are in English, I regularly get results where the bulk of the post is in another language. Though these blogs are often merely spam blogs using automatic translation, they are still of little use to me as I can&#8217;t read them, at least not without some translation, which none of the blog search engines seem to provide.</li>
</ol>
<p>This is not to say that all of the search engines suffer from the problems above equally. Google Blog Search, for example, is inundated with spam blogs and non-blog results, Technorati is worst about repeating posts and delivering content too late while Icerocket seems to return a large number of foreign language posts. </p>
<p>Of course, these aren&#8217;t the only things that have been limiting the usefulness of my blog search tools, but they are five of the biggest players to be certain. If I were going to start fixing blog search, these are the places I would start.</p>
<h4>Stinky Swiss Cheese</h4>
<p><img SRC="http://aycu03.webshots.com/image/48362/2002435910069554829_rs.jpg" align="left" class="picleft"/>Of course, the problem blog search isn&#8217;t just that its results are cluttered. I&#8217;d be willing to deal with a large amount of clutter in order to stay on top of the relevant news. Unfortunately, it is a regular occurrence that I miss stories, including those that the feed should have picked up.</p>
<p>I can offer no explanation for why this has happened, many of the sites are available if you perform a manual search so there should be no reason they are missed. </p>
<p>Fortunately, since I do use mulitple search engines, the problem is not as severe as if I had used only one. But even with all of these layers, some sites do not show up and the problem baffles me. </p>
<p>Of the three I use regularly, Icerocket seems to be the worst at picking up all of the articles I need, even regularly missing updates from Plagiarism Today, but the problem, most of the time, seems somewhat random and hard to pin on one or two engines.</p>
<h4>Effects on Content Theft</h4>
<p>When it comes to detecting content theft, many of the most popular tools, including <a href="http://www.plagiarismtoday.com/2007/05/24/copyfeed-plugin-now-available-in-english/">Copyfeed</a> and the <a href="http://www.plagiarismtoday.com/2006/10/05/update-digital-fingerprint-plugin-beta-2/">Digital Fingerprint Plugin</a>, both use blog search engines to detect misuse of the RSS feed.</p>
<p>In that regard, the fact that spam blogs and duplicate posts routinely show up in blog search engines is a good thing. It means that the detection is more reliable and that the odds of the bad guys showing up is good.</p>
<p>Unfortunately, the fact that they can reliably appear in the results also encourages spam blogging as an activity. Though blog search engines are not the ultimate target of sploggers, the fact that junk content routinely gets indexed in them does not discourage spammers. This means that, while we will see a large percentage of the scrapers, it means that there will be many more of them.</p>
<p>Furthermore, with the other issues raised with blog search engines, it appears that we&#8217;ll be stumbling through the clutter far more than we&#8217;ll be battling the bad guys. This could, in the long run, actually hinder our ability to locate and deal with spam bloggers, by not only throwing up excesses of noise, but making it impossible to target the ones that pose the greatest threat.</p>
<h4>Conclusions</h4>
<p>What is striking is how different the blog search results are compared to the traditional search engines. Though Google and Yahoo! may take a few more days to pick up their results, they are almost completely free of spam blogs and seem, overall, to present relevant content in an organized manner.</p>
<p>Where Google Blog Search is a dismal failure, the main Google search engine is a triumph. This shows not just the different challenges each search engine faces, but a lack of focus and effort on creating the best blog search results possible.</p>
<p>It is clear that blog search has not been given the attention it needs to thrive. The companies involved, even those who have blog search as their sole business, have decided that it is not worth dedicating the resources toward fixing these issues.</p>
<p>Sadly, these are not new problems that only unveiled themselves recently, they have been ongoing issues that have simply grown larger and larger to the point that they now drag the entire system down. Where once they were minor nuisances begging to be nipped in the bud, now they are huge problems that feel almost too big to tackle.</p>
<p>Sadly, it is only a matter of time before blog search becomes completely useless, if it isn&#8217;t there already, and it is a problem that many of us have seen coming for a very long time. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/03/07/why-blog-searching-fails/feed/</wfw:commentRss>
		<slash:comments>37</slash:comments>
		</item>
		<item>
		<title>RSS Brief: Another Scraping/Spam Threat</title>
		<link>http://www.plagiarismtoday.com/2007/09/14/rss-brief-another-scrapingspam-threat/</link>
		<comments>http://www.plagiarismtoday.com/2007/09/14/rss-brief-another-scrapingspam-threat/#comments</comments>
		<pubDate>Fri, 14 Sep 2007 16:08:42 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[icerocket]]></category>
		<category><![CDATA[Pay-Per-Post]]></category>
		<category><![CDATA[PPP]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[RSS-Brief]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogging]]></category>
		<category><![CDATA[Splogging]]></category>
		<category><![CDATA[Splogs]]></category>
		<category><![CDATA[technorati]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/09/14/rss-brief-another-scrapingspam-threat/</guid>
		<description><![CDATA[Yesterday, the makers of the controversial Pay Per Post service launched a new tool designed to make blog reading faster, RSS Brief. The idea is that the service takes long posts, like what you might expect here on Plagiarism Today, and condenses them down into a few short sentences. Though the service sounds convenient and...]]></description>
			<content:encoded><![CDATA[<p>Yesterday, the makers of the controversial <a href="http://payperpost.com/">Pay Per Post service</a> <a href="http://www.blogherald.com/2007/09/13/payperpost-launches-rss-brief-alpha/">launched a new tool</a> designed to make blog reading faster, <a href="http://www.rssbrief.com/">RSS Brief</a>. </p>
<p>The idea is that the service takes long posts, like what you might expect here on Plagiarism Today, and condenses them down into a few short sentences. </p>
<p>Though the service sounds convenient and useful, it also raises significant copyright and spam issues that the company has not addressed as of yet. </p>
<p>Though the service is only in alpha, the time to consider these issues is now, before the service is completed and becomes an active part in many people&#8217;s blogging lives and it is too late to change course.</p>
<p><span id="more-653"></span><strong>How it Works&#8230; In Brief</strong></p>
<p><a href="http://www.rssbrief.com/about">The idea behind RSS Brief</a> is pretty simple. You punch in the URL of your favorite blog, RSS Brief will read the entries in the feed and use what its creators refer to as &#8220;natural language technology&#8221; to parse the text down to a few sentences.</p>
<p>The idea is that, unlike traditional truncating that simply cuts off everything but the first few sentences, you will receive an effective summary of the post. This should, in theory, allow you to get the basic idea of the post and move on.</p>
<p>The technology, however, is questionable at this point. Plagiarism Today&#8217;s RSS Brief page shows some of the weaknesses. Though PT is the type of site targeted by this service, it utterly fails to give a meaningful summary of any of the stories in the RSS feed. Instead, on most stories, it seems to simply do the kind of truncating it claimed to avoid. </p>
<p>However, finding glitches in alpha-stage technology is not as disturbing as the copyright and spam issues that this service raises. It seems that, in the rush to create this service, the programmers completely avoided any and all issues about the copyright issues it might raise and how their technology might be abused.</p>
<p><strong>Copyright Issues</strong></p>
<p>What RSS Brief does, fundamentally, is take a lengthy post and make a derivative work of it. Under copyright law, the creation of derivative works is the sole right of the copyright holder. </p>
<p>Though there is a decent fair use argument for RSS Brief in that the use is largely transformative and only takes a small portion of the original, there is a strong argument against them as well. Their use of the work, by their own design, takes the heart of the original material, it does so for a commercial purpose, and RSS Brief is designed to replace the original work, thus damaging the market for the author&#8217;s work, especially if the author has ads in the feed.</p>
<p>Worse still, the service continues to &#8220;summarize&#8221; even shorter works, some as short as sixty words. This severely raises the amount of the original work used and lowers the likelihood that the use will be found fair.</p>
<p>However, most damming of all is the 1841 case <a href="http://www.faculty.piercelaw.edu/redfield/library/Pdf/case-folsom.marsh.pdf">Folsom v. Marsh</a> (PDF) that found the following when dealing with the issue of &#8220;transformative&#8221; use:</p>
<blockquote><p>(if a user) cites the most important parts of the work, with a view, not to criticise, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy.</p></blockquote>
<p>Though it is impossible to predict whether or not a use will be deemed &#8220;fair&#8221; until it goes before a judge and/or jury, there seems to be a lot of reason to doubt whether or not RSS Brief will pass muster in that situation. </p>
<p>Most damming of all being its stated attempt to replace the original work and the lack of any opt out mechanism, such as the one Google uses to ensure its cache is fair use. </p>
<p><strong>The Spam Issue</strong></p>
<p>Though many readers would love an &#8220;important parts only&#8221; feed, so would spammers. Fortunately for them, RSS Brief offers up just such a feed on their service, one that essentially scrapes, processes and rebroadcasts the original feed in their &#8220;brief&#8221; format.</p>
<p>Spammers will, most likely, grow to love these feeds. Not only are they keyword rich and to the point, but can easily be combined with other feeds from the same service to create rapid-fire blogs with short posts, something search engines seem to love.</p>
<p>Already spammers take advantage of Technorati, Icerocket and Google Blog Search feeds for much the same purpose. They enjoy the keyword density those feeds provide and the fact that they raise fewer copyright issues than scraping full feeds.</p>
<p>Though an RSS Brief feed might be less keyword rich, it would also be much more modified from the original, making it harder for search engines and Webmasters to spot. Depending on the nature of the spammer, they might find this RSS Brief feeds preferable to the existing alternatives. </p>
<p>Also, much like the search feeds, RSS Brief strips out any and all digital fingerprints as well as copyright information contained in the feed. It&#8217;s rush to get to &#8220;just the facts&#8221; causes it leave out some very critical elements to bloggers. This also makes the use of RSS Brief feeds impossible to track, unless they report usage to FeedBurner, and leaves Webmasters in the dark about how many are subscribing to the feed and how they are using it. </p>
<p>Finally, since Pay Per Post is not a search company, it&#8217;s not in a position to punish people who do scrape their feeds. Technorati and Google can blacklist sites that scrape their search results, Pay Per Post has no such card to play. </p>
<p>If spammers aren&#8217;t already looking at RSS Brief as a new tool, they likely will be soon. They seem to seize on new technology as fast as they can and I doubt this service will be any exception.</p>
<p><strong>Conclusions</strong></p>
<p>As interesting as the idea of RSS Brief is, it is poorly executed. As of this writing, there is no means for Webmasters to opt out, no clear safeguards against spam blogging and no consideration to Webmasters. There is </p>
<p>Though Pay Per Post has always been a controversial company, they have always been a company that seemed to value bloggers and the role they play, albeit in a somewhat backhanded way. That is why it seems so odd to me that they created this service with so little consideration to them.</p>
<p>One day they are paying bloggers for reviews, the next they are taking their feeds, without permission or an opt out mechanism, and creating derivative works to be redistributed over RSS.</p>
<p>Hopefully they can get these issues as well as their technical glitches straightened out. The idea is interesting but doing so in the way they are doing it is very dangerous to both them, bloggers and the Internet at large.</p>
<p>It borders on irresponsible and if Pay Per Post is going to change their image, they need to put the good of the Web and of bloggers first. They made that mistake when they first launched their primary service and it seems that history is, in a strange way, repeating itself.</p>
<p>Hopefully that won&#8217;t be the case.</p>
<p><strong>Note:</strong> If there is an interest in an excerpt-only &#8220;just the facts&#8221; feed for this site, I will create one. WordPress has the tools to do that and I&#8217;ll simply create the second feed this weekend. If interested, please post a comment below or send me an email.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/09/14/rss-brief-another-scrapingspam-threat/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Reporting Content Theft and Blacklisting Plagiarists</title>
		<link>http://www.plagiarismtoday.com/2007/04/12/reporting-content-theft/</link>
		<comments>http://www.plagiarismtoday.com/2007/04/12/reporting-content-theft/#comments</comments>
		<pubDate>Thu, 12 Apr 2007 22:23:46 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[blogstamp]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[digital-fingerprint]]></category>
		<category><![CDATA[icerocket]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogs]]></category>
		<category><![CDATA[technorati]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/04/12/reporting-content-theft/</guid>
		<description><![CDATA[SoloSEO, an SEO company from Utah, announced on their blog that they are starting a database of sites that plagiarize content and are asking other Webmasters to contribute to it. The idea is that, in addition to shaming and drawing attention to sites that steal, the list can eventually be used, in conjunction with plugins,...]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.soloseo.com">SoloSEO</a>, an SEO company from Utah, <a href="http://www.soloseo.com/blog/2007/04/12/report-content-theft/">announced on their blog</a> that they are <a href="http://www.soloseo.com/content-theft.html">starting a database of sites that plagiarize content</a> and are asking other Webmasters to contribute to it.</p>
<p>The idea is that, in addition to shaming and drawing attention to sites that steal, the list can eventually be used, in conjunction with plugins, to help blacklist sites that engage in such behavior. Ideally, it could be used to blacklist trackbacks and pings from a given site as well as, possibly, avoid the sites from being listed in search engines.</p>
<p>Michael Jensen, the creator of the list,  said he hopes the list will be used to &#8220;point out the people who dirty the web on the backs of smart and talented individuals.&#8221; and, in the process, make the Web a little bit more free from content theft and spam blogs.</p>
<p>However, though these are lofty and noble goals, the service, as it sits, has a long ways to go before it becomes as useful as it is desired to be.</p>
<p><span id="more-470"></span><strong> What it Is</strong></p>
<p>As of this writing, the list is nothing impressive. It is a collection of fifteen pages across nine domains, all reported by Jensen himself, that have been accused of scraping.  It also has a form in which a user can past an infringing page, an original page and their name (though anonymous posting is allowed).</p>
<p>One interesting feature of the site is that, next to each domain, is a link to the site&#8217;s whois information. However, this information is often wrong or anonymous. In cases of subdomains, such as with blogspot.com accounts, the information points to the original company, not the plagiarist. It also doesn&#8217;t seem to work with all registrars, prompting users to visit a second or third page to get the full information.</p>
<p>Still, the feature is potentially useful with top-level domains and using a different whois checker, such as my favorite <a href="http://www.domaintools.com/">Domain Tools</a>, might produce better results.</p>
<p>All in all though, the current list is not much to look at. The promise, however, is more in what it could be molded into later.</p>
<p><strong>What it Could Become</strong></p>
<p>Jensen is actively seeking help in turning this list into something greater. Specifically, he is seeking out <a href="http://wordpress.org/extend/plugins/">WordPress plugin</a> authors who might be willing to create a reporting system that submits duplicate content trackbacks to the list at the click of a mouse. This might also be a natural addition to <a href="http://www.maxpower.ca/wordpress-plugin-digital-fingerprint-detecting-content-theft/2006/09/25/">MaxPower&#8217;s Digital Fingerprint plugin</a>, which helps detect scraped content on the Web.</p>
<p>Another interesting idea might be to create a <a href="https://addons.mozilla.org/en-US/firefox/browse/type:1">Firefox extension</a> that can be used to report plagiarist sites that turn up during either traditional searches or via <a href="http://www.google.com/alerts">Google Alerts</a>. This would be a great help to Webmasters that don&#8217;t use WordPress or monitor their content through other means.</p>
<p>In addition to making input easier, the output could be made easier by adding plagiarist sites do various blacklists including to ones that block trackbacks and pings. <a href="http://akismet.com/">Akismet</a>, the anti-comment spam service, could potentially benefit from this kind of information as could blog search engines such as <a href="http://www.technorati.com">Technorati</a> and <a href="http://www.icerocket.com">Icerocket</a>. Also, major pinging services, such as <a href="http://rpc.weblogs.com/">Weblogs.com</a>, would be able to avoid passing on pings for know scrapers.</p>
<p>All in all, there is a lot of potential for this list or one like it. However, before it can achieve its full potential, there are several problems that will have to be overcome.</p>
<p><strong>Problems with the Service</strong></p>
<p>Though the idea is very interesting and has a lot of exciting potential uses, the SoloSEO&#8217;s list has a long way to go before it makes a major impact on the Web.</p>
<p>First, as of right now, there is no verification of new entries posted. The user submits an infringing URL, an original one and their name (if they wish) and the post is added. Where a DMCA notice requires full contact information, a statement under the penalty of perjury and requires other safeguards, a notice to this list is automatic and anonymous. There is no protection against false information and abuse.</p>
<p>The only safeguard that is offered is an email address to contact in the event that false information is posted. However, this type of &#8220;after the fact&#8221; resolution would be small comfort if this, or any similar list, were to play a vital role in stamping out duplicate content on the Web. Such a system would almost certainly be abused to list sites that were either unpopular or held controversial views, effectively chilling a portion of their speech.</p>
<p>Second, the public nature of this list is disturbing. I&#8217;ve written before about <a href="http://www.plagiarismtoday.com/2005/08/04/the-shame-game-why-mob-justice-doesnt-work/">the Shame Game</a> and the reasons why mob justice doesn&#8217;t work. Shame can be a deterrent, but it also leads to other problems including defamation issues, vigilantism and inaccurate information. Not only can such a list be abused by plagiarists to turn their infringement around on the original authors, but it is easy to make a simple mistake when pasting links and accidentally blacklist yourself.</p>
<p>These drawbacks almost always outweigh the usefulness of going public with tales of plagiarism.</p>
<p>Third,  spammers and scrapers are able to create new accounts far faster than they can be reported. In the time it takes a user to report a scraper, the software that made the spam site can make a dozen more. This makes such a service a matter of chasing ones own tail, always being a few steps behind the worst offenders.</p>
<p>Finally, such a list does nothing to get the content removed. Where a DMCA notice or a cease and desist letter might get the site removed from the Web, or at least the search engines, this list simply shames the plagiarist a bit and, possibly, directs other sites to ignore it. The content still remains.</p>
<p>However, despite these issues, there is at least some hope for this list, provided it is able to evolve in a slightly different direction.</p>
<p><strong>A New Direction</strong></p>
<p>The idea behind this list is sound and it might be able to play a role in stopping or at least deterring plagiarism on the Web. However, before that happens, there will have to be several changes:</p>
<ol>
<li><strong>Better Accountability:</strong> The site needs to make its users more accountable for what they say. Anonymous posting is enticing, but also very dangerous and ripe for abuse. Having users register for an account before reporting plagiarists would help as it would facilitate in banning users that obviously are abusing the system.</li>
<li><strong>Verified Information:</strong> Rather than simply accepting that the information given is valid, the list needs to do at least some rudimentary evaluation. Granted, this is a difficult task, but it might be an area that <a href="http://www.blogstamp.org/">Blogstamp</a> can help with. (<a href="http://www.plagiarismtoday.com/2007/03/28/blogstamp-certified-blog-timestamps/">previous coverage</a>)</li>
<li><strong>Private Information:</strong> Making the list publicly available and exposing personal information opens the site up to libel suits and only offers a deterrent in a small percentage of plagiarists. It would be best if this information were hidden from public view, though perhaps included in private blacklists, until it can be independently verified. In that regard, Blogwerx has an <a href="http://www.plagiarismtoday.com/2006/06/05/product-preview-blogwerx-sentinel/">interesting approach to dealing with this issue</a>.</li>
<li><strong>Better Resolution:</strong> Rather than just shotgunning the information out there, it would be better if the site worked, in some manner, as an intermediary to resolve the issues. Offering a means for the accused plagiarist to contact the original author would be a step in that direction.</li>
<li><strong>Greater Cooperation:</strong> Clearly, as Jensen has stated, the service is going to need better cooperation both getting material in and making use of what&#8217;s there. There&#8217;s the potential to create a goldmine of information here that can help deal with scraping and plagiarism, but tools will need to be created on both sides to make that happen.</li>
</ol>
<p>If those steps were taken, it would not be an ideal service and there would still be more progress to be made, but the service would at least be well on its way.</p>
<p><strong>Conclusions</strong></p>
<p>There have been other attempts to create blacklists of spam blogs and scrapers, <a href="http://www.splogspot.com/">Splogspot</a> being one example, but most have flamed out (<a href="http://www.splogspot.com/recent">including Splogspot itself</a> apparently). The difficulty in creating a useful list, maintaining it and funding it are almost always too much to bear. Furthermore, with spammers creating sites faster than they can be reported, the usefulness of any such list is dubious.</p>
<p>Over all, it appears that the key lies in hosts keeping spam blogs and scrapers off of their servers, as <a href="http://www.plagiarismtoday.com/2007/04/09/why-wordpresscom-is-virtually-spam-free/">WordPress.com seems to have done</a>, and shutting down the networks at their source by cutting off funding.</p>
<p>However, a list such as this one might be useful in reducing trackback spam and giving search engines forewarning of a spam blog that is operating. That could not only mitigate against the <a href="http://www.plagiarismtoday.com/2006/12/21/google-addresses-duplicate-content/">duplicate content penalty</a>, but also prevent scrapers from <a href="http://www.plagiarismtoday.com/2007/03/06/gaming-technorati/">gaming the blog search engines</a>.</p>
<p>What remains to be seen though is if this new list will be widely used by other bloggers or remain Jensen&#8217;s personal &#8220;Hall of Shame&#8221; style site.</p>
<p>If it takes off and is well supported, it may become a valuable tool. Otherwise, it may just remain a personal site used to out plagiarists of that Jensen&#8217;s own literary efforts.</p>
<p>Either way, it would be interesting to see if such an effort could be useful and what impact it would have.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/04/12/reporting-content-theft/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 13:11:30 -->
