<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism Todayanonymouse | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/anonymouse/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 06:51:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>workFRIENDLY: An Accidental Scraper</title>
		<link>http://www.plagiarismtoday.com/2007/11/09/workfriendly/</link>
		<comments>http://www.plagiarismtoday.com/2007/11/09/workfriendly/#comments</comments>
		<pubDate>Fri, 09 Nov 2007 21:07:20 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Prevention]]></category>
		<category><![CDATA[anonymouse]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[fair use. DMCA]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google cache]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[proxies]]></category>
		<category><![CDATA[Scraping]]></category>
		<category><![CDATA[Spam-Blogs]]></category>
		<category><![CDATA[Splogs]]></category>
		<category><![CDATA[workfriendly]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/11/09/workfriendly/</guid>
		<description><![CDATA[On the surface, workFRIENDLY is something of a novelty site. The idea is pretty simple, you punch in a URL that you want to visit and workFRIENDLY pulls up the site in a format that resembles a Microsoft Word document (see Blog Herald on workFRIENDLY). The idea is that, if you use the site to...]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.divshare.com/download/2699236-c33"><img align="left" hspace="5" src="http://www.divshare.com/img/2699236-c33.png" border="0" /></a>On the surface, <a href="http://www.workfriendly.net/">workFRIENDLY</a> is something of a novelty site. </p>
<p>The idea is pretty simple, you punch in a URL that you want to visit and workFRIENDLY pulls up the site in a format that resembles a Microsoft Word document (<a href="http://www.workfriendly.net/browse/Office2003Blue/www.blogherald.com">see Blog Herald on workFRIENDLY</a>). The idea is that, if you use the site to surf the Web while at work, it will be less suspicious than having a regular browser open should your boss walk by.</p>
<p>However, the simple and somewhat tongue-in-cheek nature of the site belies a potential threat to Webmasters. A <a href="http://www.google.com/search?q=site%3Aworkfriendly.net&#038;ie=utf-8&#038;oe=utf-8&#038;aq=t&#038;rls=org.mozilla:en-US:official&#038;client=firefox-a">Google search of the domain</a> reveals hundreds of thousands of indexed pages, only one of which, the home page, is original in its content. The rest are cached versions of the pages entered into the system.</p>
<p>The result is that workFRIENDLY, probably by accident, has become one of the most prolific scraping sites I&#8217;ve seen and certainly one of the best at getting their results listed in Google.</p>
<p><span id="more-725"></span><strong>What is Going On</strong></p>
<p>Fundamentally, workFRIENDLY is little different than other proxy services on the Web including <a href="http://anonymouse.org/anonwww.html">Anonymouse</a> and even Google cache. What separates workFRIENDLY from these services is that it modifies the look of the page so that it appears to be in the format of a Word document.</p>
<p>This modification of the text raises copyright questions of its own, especially in light of the <a href="http://www.eff.org/deeplinks/2006/01/google-cache-ruled-fair-use">Google cache ruling</a>, which hinged in part on Google lack of modification of the original site, and <a href="http://www.copyright.gov/title17/92chap5.html">section 512(b)</a>which protects transitory services from infringement claims for hosting infringing material so long as the data is &#8220;transmitted through the system or network without modification of its content.&#8221;</p>
<p>However, for most Webmasters, it is likely to be a technical issue that raises the most concern.</p>
<p>The problem is that most sites like workFRIENDLY use a <a href="http://www.robotstxt.org/">robots.txt</a> file to block search engines from indexing the pages viewers pull up. For example, Anonymouse has a <a href="http://anonymouse.org/robots.txt">robots.txt</a> file that blocks access to their cgi-bin directory, which is where all anonymous browsing takes place. Likewise, Google has a <a href="http://www.google.com/robots.txt">robots.txt</a> that blocks the search directory, which is where it displays its cached copies.</p>
<p>However, workFRIENDLY does not have such a robots.txt file, in fact, as of this writing, the site doesn&#8217;t have one at all. This has caused search engines to index virtually everything it can get its hands on, including over a quarter of a million pages according to Google&#8217;s admittedly flawed estimate and over <a href="https://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2Fworkfriendly.net&#038;bwm=p&#038;bwms=p&#038;fr=yfp-t-501&#038;fr2=seo-rd-se">one million in Yahoo</a>.</p>
<p>Though not everything workFRIENDLY has ever visited has been indexed, it appears that a good percentage of it has and that the site is not doing anything to stop it. Worse still, some of these pages have started turning up in organic search results, especially for more obscure keywords, and some Webmasters are getting upset about it.</p>
<p><strong>What To Do About It</strong></p>
<p>According to the person who brought this to my attention, who wishes to remain anonymous, she attempted to contact the host of the site, <a href="http://www.webhost4life.com/">WebHost4Life</a>, but was told that the data used to create the cached copies was stored elsewhere. </p>
<p>However, after further investigation, it appears that workFRIENDLY doesn&#8217;t use cached copies at all. Rather, the pages are created dynamically with each visit. I was able to test this by visiting their copy of Plagiarism Today and refreshing the page every few minutes to see if the time of the workFRIENDLY page had changed, even though my site had not.</p>
<p>If the site had used a cache, the timestamp would have stayed the same, as you can see in the images below, it did not.</p>
<p>First capture:</p>
<p><a href="http://www.divshare.com/download/2699177-3bc"><img src="http://www.divshare.com/img/2699177-3bc.png" border="0" /></a></p>
<p>Second capture:</p>
<p><a href="http://www.divshare.com/download/2699186-72c"><img src="http://www.divshare.com/img/2699186-72c.png" border="0" /></a></p>
<p>This creates a strange problem in that there is no content that the host of the site can takedown. The site itself is little more than the homepage and the needed scripts to pull down the data for display.</p>
<p>With no page to take down, the host is in a difficult situation. Though they can <a href="http://www.plagiarismtoday.com/2005/12/07/ipowerwebcom-the-nuclear-option/">disable the whole domain</a>, there is no easy way for them to remove just the infringing work. Worse yet, there is no way to simply block workFRIENDLY using meta tags or robots.txt as there is no documentation for their spider.</p>
<p>Instead, the focus shifts to blocking workFRIENDLY from accessing your site, which I was able to achieve using the plugin <a href="http://wordpress.org/extend/plugins/wp-ban/">WP-Ban</a> and the <a href="http://network-tools.com/default.asp?prog=network&#038;host=workfriendly.net">IP address of the site</a>. </p>
<p><a href="http://www.divshare.com/download/2699335-195"><img src="http://www.divshare.com/img/2699335-195.png" border="0" /></a></p>
<p>You can also <a href="http://www.javascriptkit.com/howto/htaccess.shtml">edit your .htaccess file</a> to achieve the same effect.</p>
<p>If that is not practical and you find that your search engine results are threatened or usurped by the duplicate content, you can always file a <a href="http://www.google.com/dmca.html">DMCA notice with Google</a> to get the pages removed. You can also <a href="http://www.google.com/contact/spamreport.html">report spam results</a> if you see workFRIENDLY urls in the results of other searches you perform. </p>
<p>Though not an ideal solution, it is at least a workable one and can serve as a stopgap until a more permanent answer, or at least some decisive action from Google, takes place.</p>
<p><strong>Conclusions</strong></p>
<p>I want to be clear that I do not think workFRIENDLY is doing anything malicious and, in truth, it may not even be illegal. I believe that they set up this site never expecting these results to be indexed. There is currently no advertising on the site, save the home page, and the site has hardly achieved what one would call great success with the search engines. </p>
<p>Unfortunately, a letter to workFRIENDLY went unanswered for quite some time so I do not have any word from them on this matter. </p>
<p>If the site would simply create a two-line robots.txt file that blocked indexing of the &#8220;browse&#8221; directory, the whole matter would be resolved in a very short period of time. However, as of this writing, they have not done that. </p>
<p>The result is that this site has become not just a prolific scraper, but one of the most difficult ones to deal with. </p>
<p>But what is worse about this case is the black eye that this gives Google when it comes to dealing with duplicate content. The fact that a site such as workFRIENDLY can, without any real effort, push hundreds of thousands of duplicate pages into the search results is very worrisome.</p>
<p>Hopefully Google can get around to fixing this issue soon. We have enough to worry about with the malicious spammers to spend too much time on those who simply make a mistake. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/11/09/workfriendly/feed/</wfw:commentRss>
		<slash:comments>35</slash:comments>
		</item>
		<item>
		<title>Tip: Getting Around an IP Block</title>
		<link>http://www.plagiarismtoday.com/2007/08/24/tip-getting-around-an-ip-block/</link>
		<comments>http://www.plagiarismtoday.com/2007/08/24/tip-getting-around-an-ip-block/#comments</comments>
		<pubDate>Fri, 24 Aug 2007 15:58:23 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Personal Experiences]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[anonymity]]></category>
		<category><![CDATA[anonymouse]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[google-translate]]></category>
		<category><![CDATA[ip-blocking]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[proxy]]></category>
		<category><![CDATA[the-onion-router]]></category>
		<category><![CDATA[tor]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/08/24/tip-getting-around-an-ip-block/</guid>
		<description><![CDATA[If you do enough work in dealing with plagiarists, it is bound to happen. A plagiarist, scraper or other ne&#8217;er-do-well, not wanting to deal with your accusations, will simply block your IP from visiting their site in hopes that you&#8217;ll think the site is down and go away. It&#8217;s an amateurish move that is usually...]]></description>
			<content:encoded><![CDATA[<p>If you do enough work in dealing with plagiarists, it is bound to happen. A plagiarist, scraper or other ne&#8217;er-do-well, not wanting to deal with your accusations, will simply block your IP from visiting their site in hopes that you&#8217;ll think the site is down and go away. </p>
<p>It&#8217;s an amateurish move that is usually executed by someone who knows precious little about how the Internet works. There are a million ways around an IP block, even one that filters out a large block of IPs, so it is not an effective tactic over the long haul. However, it is annoying, especially since most methods involve installing software or altering your connection.</p>
<p><span id="more-602"></span>One common method for getting around this is to use an anonymous proxy such as <a href="http://anonymouse.org/anonwww.html">Anonymouse</a>. This works because the site doesn&#8217;t see your visit, but the Proxy&#8217;s, thus changing your IP address. However, most larger, open proxies can be easily blocked and smaller ones pose a security risk since they literally sit between you and the Internet as you surf.</p>
<p>The easiest, safest and fastest, way I&#8217;ve found to get around an IP block to see if a site is really down is to use Google Translate as a proxy. To do so, just follow these steps:</p>
<ol>
<li>Visit <a href="http://translate.google.com">Google Translate</a> (translate.google.com)</li>
<li>Type in or paste the URL you want to visit in the second form.</li>
<li>Make sure that the translation is set to translate the site into English (or whatever language you want to read it in).</li>
<li>Pull up the page as usual.</li>
</ol>
<p>Google translate will read the page, copy the content and display a temporary cache of it from their own server. Since the site was already in English, it does not perform any actual translation. You will see the page as it is, or rather, as Google sees it.</p>
<p>Best of all, almost no one blocks Google, making it almost impossible to actually block this proxy. If one did block it, they would likely also be removing themselves from the Google search index, committing a form of SEO suicide.</p>
<p>If being the subject of an IP block is a regular event, you may want to consider using <a href="http://tor.eff.org/">Tor</a> (AKA: The Onion Router) to alter your visible IP. There is even a <a href="https://addons.mozilla.org/en-US/firefox/addon/2275">FireFox extension</a> to enable and disable Tor functionality at the push of a button.</p>
<p>All in all, if a plagiarist attempts to perform an IP block on you, it is important to remember that you are not dealing with a mastermind or even a respectable opponent, but a rank amateur. </p>
<p>IP blocks can be trivially circumvented, one just has to know where to look in order to defeat them. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/08/24/tip-getting-around-an-ip-block/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 12:46:01 -->
