<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism Todayworkfriendly | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/workfriendly/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 06:51:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Workfriendly Goes Offline</title>
		<link>http://www.plagiarismtoday.com/2008/07/10/workfriendly-goes-offline/</link>
		<comments>http://www.plagiarismtoday.com/2008/07/10/workfriendly-goes-offline/#comments</comments>
		<pubDate>Thu, 10 Jul 2008 14:36:24 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[duplicate-content]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[plagiarim]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[search spam]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[workfriendly]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=1295</guid>
		<description><![CDATA[Safe-surfing site and "accidental scraper" Workfriendly is now offline after more than two years of pushing duplicate content into Google. ]]></description>
			<content:encoded><![CDATA[<p><IMG SRC="http://www.plagiarismtoday.com/images/workfriendlylogo1-20080710-093524.png" alt="Workfriendly Logo" align="left" class="picleft">Workfriendly, a site previously reported on Plagiarism Today <a href="http://www.plagiarismtoday.com/2007/11/09/workfriendly/" title="Workfriendly an Accidental Scraper">back in November 2007</a> and again in <a href="http://www.plagiarismtoday.com/2008/04/08/workfriendly-yet-another-issue/" title="Another Workfriendly Issue">April of this year</a>, stopped functioning sometime within the past few days, bringing an end to the problems it created for many Webmasters.</p>
<p>The site currently is just a &#8220;parked&#8221; domain page running ads for the domain&#8217;s registrar, GoDaddy. According to the <a href="http://whois.domaintools.com/workfriendly.net" title="Workfriendly Whois">whois information for the site</a>, the domain was &#8220;updated&#8221; on the eighth, indicating that it possibly expired and was transferred to another owner. </p>
<p>Workfriendly attempted to disguise Web surfing as a Microsoft Word document by formatting Web pages to appear as text in a Word file while bordering the site content with a fake border designed to look like the application. This was supposed to make it &#8220;safer&#8221; to surf at work as it would raise less suspicion should anyone see your monitor.</p>
<p>The site created problems, however, when it allowed search engines to index its modified pages, injecting many thousands of of pages worth of duplicate content into Google. It also created headaches by not obeying certain meta tags, causing links to break on some sites and for Google to report those errors as broken links on the original domain.</p>
<p>It is unclear at this time if the outage is temporary or permanent, however, the site has been down for at least two days, making a temporary outage increasingly unlikely. </p>
<p><strong>Hat tip:</strong> Special thanks to <a href="http://www.sciencebase.com/">David Bradley of Sciencebase</a> (stupid typos, thanks for the catch!) for letting me know that Workfriendly is not working</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/07/10/workfriendly-goes-offline/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Workfriendly: Yet Another Issue</title>
		<link>http://www.plagiarismtoday.com/2008/04/08/workfriendly-yet-another-issue/</link>
		<comments>http://www.plagiarismtoday.com/2008/04/08/workfriendly-yet-another-issue/#comments</comments>
		<pubDate>Tue, 08 Apr 2008 14:56:34 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[errors]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google blog search]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[technorati]]></category>
		<category><![CDATA[workfriendly]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=887</guid>
		<description><![CDATA[Workfriendly, a script that masks the Web to look like an open Microsoft Word document, may have been created as a joke, but it continues to create serious problems for the Webmasters that it scrapes. ]]></description>
			<content:encoded><![CDATA[<p><img class="picleft" style="float: left;" src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlylogo.png" alt="WorkFriendly Logo" width="185" height="36" />Back in November of last year, I wrote an article about <a title="WorkFriendly" rel="nofollow" href="http://www.workfriendly.net">Workfriendly</a>, calling it an &#8220;<a title="WorkFriendly as an Accidental Scraper" href="http://www.plagiarismtoday.com/2007/11/09/workfriendly/">accidental scraper</a>&#8221; and accusing the site of allowing search engines to index pages containing scraped content.</p>
<p>The site, which is simply a script that <a href="http://www.diylife.com/2008/03/17/surf-the-web-without-your-boss-knowing/">modifies other sites</a> to look like a document in Microsoft Word, so that one can surf the Web at work without raising suspicion, has <a title="Google Results for WorkFriendly" href="http://www.google.com/search?q=site%3Aworkfriendly.net&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a">nearly a quarter of a million URLs referenced in Google</a>, even though only one page, the home page, contains original content.</p>
<p>However, I recently discovered that Workfriendly has another issue with it, one that causes, in some cases, both users and the search engines to seek out nonexistant URLs, causing 404 errors in very large numbers.</p>
<p>Though it is a problem caused by Workfriendly, it is one that Webmasters and bloggers need to take action to correct if they are vulnerable. Otherwise, the search engines could be steered toward hundreds of non-working URLs on your site, potentially hurting your ranking in them.<br />
<span id="more-887"></span></p>
<h4>Discovering the Problem</h4>
<p><img class="picright" src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlysucks21.jpg" border="0" alt="workfriendlysucks2.jpg" width="267" height="275" align="right" />I discovered the problem with Workfriendly over the weekend by accident. I logged into my Google Webmaster Tools account to check on any errors I had and was stunned to find over 150 file not found errors.</p>
<p>WordPress typically does a pretty good job avoiding file not found errors so to discover so many on my site, especially with no other errors found, was surprising.</p>
<p>Thinking that, perhaps, my recent update had caused an issue with my permalinks, I looked at the errors themselves. One was caused by me changing the date on a post, another was a server error where the URL worked fine, but the other 149 pointed to a directory that does not and has never existed on this server &#8220;/browse/Office2003Blue/&#8221;.</p>
<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlysucks3-2.png" border="0" alt="workfriendlysucks3_2.png" width="550" height="202" /></p>
<p>I remembered that Workfriendly used a similar link structure when you browsed the Web through it. I hopped onto the site and pulled up Plagiarism Today and watched as Workfriendly pulled up the site successfully. Clearly, the ban I had put in place a few months ago had stopped working, likely due to the plugin I was using not being compatible with newer version of WordPress.</p>
<p><img class="picleft" src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlysuck7.jpg" border="0" alt="workfriendlysuck7.jpg" width="368" height="205" align="left" />After pulling up Plagiarism Today in Workfriedly, I hovered my mouse over one of the links and looked at the URL, indeed, it was pointing to URLs on this server in the non-existant &#8220;browse&#8221; directory. Clicking the link resulted in chaos in Workfriendly and, in most cases, led to the site loading up without Workfriendly&#8217;s obfuscation.</p>
<p>I immediately set out to block Workfriendly, this time using a hand-coded <a title="How to Block Scrapers with .htaccess" href="http://www.plagiarismtoday.com/2007/07/02/using-htaccess-to-stop-content-theft/">.htaccess block</a>, but not before trying to figure out what was causing the problem.</p>
<h4>Understanding the Issue</h4>
<p>What made the problem perplexing was that it seemed to only be this site that was having the issue. Other sites I tested with Workfriendly worked fine.</p>
<p>However, after I looked at the source code for the page that Workfriendly created, the problem became almost immediately clear.</p>
<p>Plagiarism Today uses a &#8220;base&#8221; meta tag. It is a tag used to tell search engines and Web browsers what the &#8220;base&#8221; URL of your site is so that, when you use relative links (links that do not begin with an &#8220;http://&#8221;), the browser knows what URL you are pointing to.</p>
<p>It is a good practice for SEO reasons and to help with <a title="Preventing 302 Hijacking" href="http://www.plagiarismtoday.com/2007/06/14/302-hijacking-an-old-danger-made-new-again/">preventing 302 hijacking</a>. Still, most sites do not have one and, in many cases, it isn&#8217;t necessary.</p>
<p>The problem was that Workfriendly, despite having manipulated all of the links on my site, was using relative links for everything. Rather than saying &#8220;http://www.workfriendly.net/browse/&#8230;&#8221; the links simply said &#8220;/browse/&#8230;&#8221;.</p>
<p>When it was combined with the base tag by the browser, that converted all of the links to &#8220;http://www.plagiarismtoday.com/browse/&#8230;&#8221;, a link that does not exist.</p>
<p>The combination of the base tag and Workfriendly&#8217;s use of relative links was causing the site to throw back URLs that did not exist and, due to the poor use of robots.txt, causing the search engines to pick up those bad links as well.</p>
<h4>An Inconsiderate Script</h4>
<p>My issue with Workfriendly has never been the service itself. Though some could argue that it creates a derivative work of the sites it processes, since the works are never saved, but are rather created dynamically, it is a difficult case to make.</p>
<p>However, more to the point, I am not upset about sites that want to remix or alter the site to make it easier to read. I would not oppose a version better suited for the visually impaired, for mobile browsers or other formats as needed, so long as the site showed basic respect for the content it was displaying.</p>
<p>And that is the problem with Workfriendly. The service shows no consideration for the Webmasters whose content it uses.</p>
<p>For one, the site allows the search engines to index the scraped pages, even though the pages do not exist and are, instead, dynamically-generated.</p>
<p>Second, sloppy programming on the site causes it to generate artificial 404 errors that could hurt Webmasters when dealing with the search engines. Fortunately though, since the bad links are on an external site, they likely won&#8217;t have much impact.</p>
<p>However, if Workfriendly had simply used a correct link format, including the &#8220;http://www.workfriendly.net&#8221; before each link or stripping out the Base tag, the issue would not be a problem at all.</p>
<p>But what is perhaps strangest of all is that Workfriendly offers you a script that you can put on your site to direct your visitors to their version of your site. However, in addition to letting your visitors use the Workfriendly service, you may be helping the search engines find your content in their links.</p>
<p>It seems unlikely that is worth the trade off.</p>
<h4>Conclusions</h4>
<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2008/04/workfriendlysucks5-1.jpg" border="0" alt="workfriendlysucks5-1.jpg" width="250" height="163" align="right" />Personally, I decided it was time to be done with Workfriendly. I edited my .htaccess file and have banned the server from accessing this site. So far it is the only IP to be completely banned from this domain. If you attempt to access the site from Workfriendly, you will get the message displayed to the right.</p>
<p>If anyone is looking for the code I added to my .htaccess file, I simply put this before any of my WordPress code:</p>
<blockquote><p>order allow,deny<br />
deny from 66.226.27.21<br />
allow from all</p></blockquote>
<p>This certainly isn&#8217;t the type of steps I wanted to take, but it was I felt I was forced to do and, sadly, what I have to encourage others to look at doing to.</p>
<p>But the problem is that, in their bid to create something simple and fun, the creators of Workfriendly made something that poses a real danger to Webmasters and bloggers. Though simple changes to the system could remedy these problems easily, the authors have either neglected or refused to do so.</p>
<p>The result, on this site at least, is that Workfriendly is banned. I have attempted to contact the creators several times in the past but have never received a response. Considering all of the attention that has been paid to scraping issue, it seems that either the creators are ignoring the criticism, or have abandoned the project.</p>
<p>Either way, right now Workfriendly is just another problem for Webmasters and bloggers to worry about.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/04/08/workfriendly-yet-another-issue/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>workFRIENDLY: An Accidental Scraper</title>
		<link>http://www.plagiarismtoday.com/2007/11/09/workfriendly/</link>
		<comments>http://www.plagiarismtoday.com/2007/11/09/workfriendly/#comments</comments>
		<pubDate>Fri, 09 Nov 2007 21:07:20 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[anonymouse]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[fair use. DMCA]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google cache]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[proxies]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogs]]></category>
		<category><![CDATA[workfriendly]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/11/09/workfriendly/</guid>
		<description><![CDATA[On the surface, workFRIENDLY is something of a novelty site. The idea is pretty simple, you punch in a URL that you want to visit and workFRIENDLY pulls up the site in a format that resembles a Microsoft Word document (see Blog Herald on workFRIENDLY). The idea is that, if you use the site to...]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.divshare.com/download/2699236-c33"><img align="left" hspace="5" src="http://www.divshare.com/img/2699236-c33.png" border="0" /></a>On the surface, <a href="http://www.workfriendly.net/">workFRIENDLY</a> is something of a novelty site. </p>
<p>The idea is pretty simple, you punch in a URL that you want to visit and workFRIENDLY pulls up the site in a format that resembles a Microsoft Word document (<a href="http://www.workfriendly.net/browse/Office2003Blue/www.blogherald.com">see Blog Herald on workFRIENDLY</a>). The idea is that, if you use the site to surf the Web while at work, it will be less suspicious than having a regular browser open should your boss walk by.</p>
<p>However, the simple and somewhat tongue-in-cheek nature of the site belies a potential threat to Webmasters. A <a href="http://www.google.com/search?q=site%3Aworkfriendly.net&#038;ie=utf-8&#038;oe=utf-8&#038;aq=t&#038;rls=org.mozilla:en-US:official&#038;client=firefox-a">Google search of the domain</a> reveals hundreds of thousands of indexed pages, only one of which, the home page, is original in its content. The rest are cached versions of the pages entered into the system.</p>
<p>The result is that workFRIENDLY, probably by accident, has become one of the most prolific scraping sites I&#8217;ve seen and certainly one of the best at getting their results listed in Google.</p>
<p><span id="more-725"></span><strong>What is Going On</strong></p>
<p>Fundamentally, workFRIENDLY is little different than other proxy services on the Web including <a href="http://anonymouse.org/anonwww.html">Anonymouse</a> and even Google cache. What separates workFRIENDLY from these services is that it modifies the look of the page so that it appears to be in the format of a Word document.</p>
<p>This modification of the text raises copyright questions of its own, especially in light of the <a href="http://www.eff.org/deeplinks/2006/01/google-cache-ruled-fair-use">Google cache ruling</a>, which hinged in part on Google lack of modification of the original site, and <a href="http://www.copyright.gov/title17/92chap5.html">section 512(b)</a>which protects transitory services from infringement claims for hosting infringing material so long as the data is &#8220;transmitted through the system or network without modification of its content.&#8221;</p>
<p>However, for most Webmasters, it is likely to be a technical issue that raises the most concern.</p>
<p>The problem is that most sites like workFRIENDLY use a <a href="http://www.robotstxt.org/">robots.txt</a> file to block search engines from indexing the pages viewers pull up. For example, Anonymouse has a <a href="http://anonymouse.org/robots.txt">robots.txt</a> file that blocks access to their cgi-bin directory, which is where all anonymous browsing takes place. Likewise, Google has a <a href="http://www.google.com/robots.txt">robots.txt</a> that blocks the search directory, which is where it displays its cached copies.</p>
<p>However, workFRIENDLY does not have such a robots.txt file, in fact, as of this writing, the site doesn&#8217;t have one at all. This has caused search engines to index virtually everything it can get its hands on, including over a quarter of a million pages according to Google&#8217;s admittedly flawed estimate and over <a href="https://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2Fworkfriendly.net&#038;bwm=p&#038;bwms=p&#038;fr=yfp-t-501&#038;fr2=seo-rd-se">one million in Yahoo</a>.</p>
<p>Though not everything workFRIENDLY has ever visited has been indexed, it appears that a good percentage of it has and that the site is not doing anything to stop it. Worse still, some of these pages have started turning up in organic search results, especially for more obscure keywords, and some Webmasters are getting upset about it.</p>
<p><strong>What To Do About It</strong></p>
<p>According to the person who brought this to my attention, who wishes to remain anonymous, she attempted to contact the host of the site, <a href="http://www.webhost4life.com/">WebHost4Life</a>, but was told that the data used to create the cached copies was stored elsewhere. </p>
<p>However, after further investigation, it appears that workFRIENDLY doesn&#8217;t use cached copies at all. Rather, the pages are created dynamically with each visit. I was able to test this by visiting their copy of Plagiarism Today and refreshing the page every few minutes to see if the time of the workFRIENDLY page had changed, even though my site had not.</p>
<p>If the site had used a cache, the timestamp would have stayed the same, as you can see in the images below, it did not.</p>
<p>First capture:</p>
<p><a href="http://www.divshare.com/download/2699177-3bc"><img src="http://www.divshare.com/img/2699177-3bc.png" border="0" /></a></p>
<p>Second capture:</p>
<p><a href="http://www.divshare.com/download/2699186-72c"><img src="http://www.divshare.com/img/2699186-72c.png" border="0" /></a></p>
<p>This creates a strange problem in that there is no content that the host of the site can takedown. The site itself is little more than the homepage and the needed scripts to pull down the data for display.</p>
<p>With no page to take down, the host is in a difficult situation. Though they can <a href="http://www.plagiarismtoday.com/2005/12/07/ipowerwebcom-the-nuclear-option/">disable the whole domain</a>, there is no easy way for them to remove just the infringing work. Worse yet, there is no way to simply block workFRIENDLY using meta tags or robots.txt as there is no documentation for their spider.</p>
<p>Instead, the focus shifts to blocking workFRIENDLY from accessing your site, which I was able to achieve using the plugin <a href="http://wordpress.org/extend/plugins/wp-ban/">WP-Ban</a> and the <a href="http://network-tools.com/default.asp?prog=network&#038;host=workfriendly.net">IP address of the site</a>. </p>
<p><a href="http://www.divshare.com/download/2699335-195"><img src="http://www.divshare.com/img/2699335-195.png" border="0" /></a></p>
<p>You can also <a href="http://www.javascriptkit.com/howto/htaccess.shtml">edit your .htaccess file</a> to achieve the same effect.</p>
<p>If that is not practical and you find that your search engine results are threatened or usurped by the duplicate content, you can always file a <a href="http://www.google.com/dmca.html">DMCA notice with Google</a> to get the pages removed. You can also <a href="http://www.google.com/contact/spamreport.html">report spam results</a> if you see workFRIENDLY urls in the results of other searches you perform. </p>
<p>Though not an ideal solution, it is at least a workable one and can serve as a stopgap until a more permanent answer, or at least some decisive action from Google, takes place.</p>
<p><strong>Conclusions</strong></p>
<p>I want to be clear that I do not think workFRIENDLY is doing anything malicious and, in truth, it may not even be illegal. I believe that they set up this site never expecting these results to be indexed. There is currently no advertising on the site, save the home page, and the site has hardly achieved what one would call great success with the search engines. </p>
<p>Unfortunately, a letter to workFRIENDLY went unanswered for quite some time so I do not have any word from them on this matter. </p>
<p>If the site would simply create a two-line robots.txt file that blocked indexing of the &#8220;browse&#8221; directory, the whole matter would be resolved in a very short period of time. However, as of this writing, they have not done that. </p>
<p>The result is that this site has become not just a prolific scraper, but one of the most difficult ones to deal with. </p>
<p>But what is worse about this case is the black eye that this gives Google when it comes to dealing with duplicate content. The fact that a site such as workFRIENDLY can, without any real effort, push hundreds of thousands of duplicate pages into the search results is very worrisome.</p>
<p>Hopefully Google can get around to fixing this issue soon. We have enough to worry about with the malicious spammers to spend too much time on those who simply make a mistake. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/11/09/workfriendly/feed/</wfw:commentRss>
		<slash:comments>35</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 10:09:26 -->
