<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Plagiarism TodayDNS | Plagiarism Today</title>
	<atom:link href="http://www.plagiarismtoday.com/tag/dns/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.plagiarismtoday.com</link>
	<description>Content Theft, Plagiarism, Copyright Infringement</description>
	<lastBuildDate>Mon, 13 Feb 2012 06:51:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Distil: The Anti-Scraping Content Protection Network</title>
		<link>http://www.plagiarismtoday.com/2012/01/26/distil-the-anti-scraping-content-delivery-network/</link>
		<comments>http://www.plagiarismtoday.com/2012/01/26/distil-the-anti-scraping-content-delivery-network/#comments</comments>
		<pubDate>Thu, 26 Jan 2012 19:00:40 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[cdn]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[distil]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[RSS scraping]]></category>
		<category><![CDATA[Scraping]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=12404</guid>
		<description><![CDATA[Distil is a new company promising to combat scraping while improving your site's performance. But how well does it work?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2012/01/distil-logo.jpg" alt="Distil Logo" title="Distil Logo" width="240" height="84" class="alignleft size-full wp-image-12417" />I&#8217;ve talked a lot on Plagiarism Today about the dangers of scraping including both <a href="http://www.plagiarismtoday.com/2011/05/09/faqs-the-basics-of-rss-scraping/">RSS scraping</a>, where someone copies the content in your RSS feed and, usually, republishes it elsewhere, <a href="http://www.plagiarismtoday.com/2011/11/16/scraping-not-just-for-rss-feeds-anymore/">and site scraping</a>, where search-engine like crawlers grab your site&#8217;s content for various purposes. </p>
<p>Defending against scraping, however, is incredibly difficult. Though some plugins and tolls like <a href="http://wordpress.org/extend/plugins/bad-behavior/">Bad Behavior for WordPress</a> and <a href="http://www.javascriptkit.com/howto/htaccess13.shtml">simple blocking of bots</a> can help, they aren&#8217;t perfect or complete solutions and in some cases, can deeply drain both your time and your site&#8217;s resources.</p>
<p>However, <a href="http://www.distil.it/">the team over at Distil thinks they have found a better way</a>. By acting as an intermediary between the Web and your site, they claim to not only be able to filter out most scrapers and infringers, but also to speed up your site and improve its performance.</p>
<p>How it works is by combining the their anti-scraping and bad bot technology with a robust content delivery network, this enables them to not only filter out threats to your site, but also serve much of your static content quickly and from servers located nearest to your visitors. </p>
<p>But is Distil worth the time and money? I decided to give it a trial and see what I found.<span id="more-12404"></span></p>
<h4>What is Distil?</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2012/01/threat-summary.jpg" alt="Distil Threat Summary" title="Distil Threat Summary" width="312" height="269" class="alignright size-full wp-image-12419" />The closest comparison one can make to Distil is <a href="http://www.cloudflare.com">Cloudflare</a> as both use DNS changes to better protect and speed up your site. </p>
<p>With Distil (or Cloudflare) you edit your DNS settings, which can usually be found at your domain registrar or in your site&#8217;s control panel, to direct visitors not to your server, but to a custom nameserver from Distil. Visitors will then query Distil for your site, which first filters out any malicious users and then delivers any content it can from its servers, which are spread all across the world. Anything it can&#8217;t deliver, it queries from your server and then provides to the user directly. </p>
<p>The end result, if all goes well, is that most of the content of your site is delivered directly from Distil&#8217;s servers, which should be faster than coming from your own, and most malicious users, including scrapers, are filtered out before they ever reach your site or your content. Best of all, the process is completely invisible to end users (other than the potential speed increase).</p>
<p>To find out, if it works as advertisers, I switched Plagiarism Today over to Distil last weekend and, as of this writing, have been using it for the better part of a week.</p>
<h4>Setting Up and Using Distil</h4>
<p>To start using Distil, you have to first sign up for an account and have it activated. Once that&#8217;s done, you&#8217;ll be given an address that, using your DNS settings, you will direct both your www.domain.com and domain.com (as well as any other subdomains you want to redirect).</p>
<p>Then, after the DNS servers propagate, you should be using Distil&#8217;s service. From there, you can log into the Distil dashboard, which lets you configure a variety of options including:</p>
<ul>
<li>Site Acceleration Settings (if available)</li>
<li>Rate Limiting</li>
<li>Blocking Known Violators</li>
<li>Blocking Bad User Agents</li>
<li>Browser Integrity Checks</li>
<li>Filter By Country</li>
<li>Block Bad Referrers</li>
<li>Whitelist/Blacklist</li>
<li>WWW/Non-WWW Routing</li>
</ul>
<p>You also get a bevy of statistical data including information about the number of unique sessions, the total number of requests, total human requests and the total bot requests. Bot requests are then further broken down by the number of search engine requests (which are always allowed) and the number of blocked requests (as well as the reasons for being blocked). The blocked bots are then further broken down by bot type, IP address and more.</p>
<p>The result is that you get an overall perspective of what&#8217;s going on with your site, both in terms of human traffic but, more directly, the security threats you&#8217;re facing. </p>
<p>But does that make Distil worth trying? A lot of it depends on your needs and what you&#8217;re looking to get out of it.</p>
<h4>The Good of Distil</h4>
<p>The one thing that immediately struck me about Distil is the granular level of control it gives you over security issues. Though Cloudflare offers a good deal of site security, it&#8217;s focused on spammers and attackers and only lets you set a broad level of security (low, medium, high or basically off). With Distil, you can set individual options to your liking both to target the threats most relevant to your site and, more importantly, make sure you don&#8217;t interfere with legitimate users.</p>
<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2012/01/distil-settings-sample-500x163.jpg" alt="Distil Settings Image" title="Distil Settings Image" width="500" height="163" class="alignnone size-large wp-image-12433" /></p>
<p>Over the past few days I&#8217;ve had no reports of legitimate visitors being hassled by Distil, something that was an occasional problem with Cloudflare, especially for visitors from outside the U.S. and Europe. </p>
<p>So, even though Distil did not block as many bots as Cloudflare (likely because I have the security settings for most features turned down or off), it did a better job staying out of the way and still seemed to stop the most egregious offenders. Over time, I plan on slowly increasing the settings to see if they block more and continue to be non-intrusive.</p>
<p>Beyond security, my first concern after switching to Distil was that my site might take a performance hit. Having been a Cloudflare user for many months, I was used to the power of a robust CDN. However, I did a series of tests both before and after the change and found that Distil was usually slightly faster than Cloudflare, often shaving off 30% of the site&#8217;s loading time. </p>
<p>Compare these two example results, first before: </p>
<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2012/01/PT-cloudflare-performance-500x197.jpg" alt="" title="PT CloudFlare Performance" width="500" height="197" class="alignnone size-large wp-image-12405" /></p>
<p>And then after:</p>
<p><img src="http://www.plagiarismtoday.com/wp-content/uploads/2012/01/pt-distil-test-500x174.jpg" alt="PT Distil Test" title="PT Distil Test" width="500" height="174" class="alignnone size-large wp-image-12406" /></p>
<p>(Note: While this example isn&#8217;t an apples-to-apples test due to differing endpoints, the results were consistent regardless of endpoint. Also, obviously there were other changes made in the four days between the tests, though no major alterations, frontend or back, were made.)</p>
<p>Finally, the support team at Distil is, simply put, the best of any company I&#8217;ve worked with. They answered every question I had very promptly, usually within 15 minutes and it didn&#8217;t seem to matter what time of the day I was asking it. This enabled me both to get my site set up quickly with Distil despite some confusion and questions and deal with an issue with Google Analytics (that turned out to be my own fault). </p>
<p>All in all, Distil did a good job in providing granular security control, a site performance boost and great support.</p>
<h4>The Problems with Distil</h4>
<p>The biggest initial problem with Distil is that, in its current form, it is not very simple to use. Not only do you have to wait for your account to be activated by a human, but the process of switching over your DNS is not as straightforward as Cloudflare. </p>
<p>If you aren&#8217;t comfortable working with DNS and aren&#8217;t familiar with how to edit CNAME and A records, the process is going to be intimidating. Sadly, unlike Cloudflare, there isn&#8217;t a great deal of hand holding unless you contact support. While I agree with Distil that&#8217;s better to not hand over total DNS control to a third party, as you have to do with Cloudflare, it&#8217;s also the much more difficult route for the user.</p>
<p>Another issue I have with Distil is the current pricing structure. The free account, which does not have content acceleration, offers only 5 GB of traffic per month, an amount even a modest blogger will likely blow through quickly. A site Plagiarism Today&#8217;s size fits (barely) under the cap for the small account, which offers 50 GB of transfer for $29 per month. However, Cloudflare&#8217;s free plan allows for unlimited traffic and it&#8217;s pro account, which offers additional statistics and monitoring, is only $20 per month. Other CDNs, such as MaxCDN, charge only $50 for 1 TB (1000 GB) of data. </p>
<p>Distil told me that they are considering restructuring their pricing in the coming weeks, a move that, most likely, will help with this problem.</p>
<p>For now at least, Distil is a terrible deal as CDN though its security features may help to make it more compelling to webmasters concerned about scraping and content misuse.</p>
<p>Finally, Distil, obviously, won&#8217;t be able to help with at least some kinds of scraping. RSS scraping likely won&#8217;t be blocked unless the bot doing it is already in the system and it is unclear just how many are. However, if you know the bot you can add it yourself in your control panel. Also, <a href="http://www.plagiarismtoday.com/2012/01/19/plagiarism-for-hire-the-changing-business-of-plagiarism/">any human copying won&#8217;t be blocked</a> because the system is designed precisely to allow humans to access your site.</p>
<p>Despite these limitations, there&#8217;s still a lot of webmasters who would likely benefit from Distil, even if that number could be a great deal larger down the road.</p>
<h4>Bottom Line</h4>
<p>Distil isn&#8217;t perfect. It&#8217;s a new company and it&#8217;s product certainly has its share of flaws. Right now, it&#8217;s aimed at a fairly niche market of webmasters who are technically savvy, want a great deal of granular control over their site&#8217;s security and are willing to pay extra to make it happen.</p>
<p>However, with some changes to its setup procedure, pricing and control panel, it could become a compelling option for many more sites. </p>
<p>In short, Distil is going to be a company to watch in the coming months and years. As it refines its tools and pricing, it could become a major force for helping content creators protect their work. </p>
<p>In the meantime though, other webmasters just wanting a CDN to improve their site&#8217;s performance will, most likely, want to look up other solutions, such as Cloudflare and MaxCDN as they are significantly cheaper and, in the case of Cloudflare, provides better analytics, easier setup and at some decent, if simplified, security features.</p>
<p>Still, if you&#8217;re in Distil&#8217;s niche, which is likely to grow, I can see why it would be a very powerful solution to a complex problem. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2012/01/26/distil-the-anti-scraping-content-delivery-network/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>3 Count: Vegas Odds</title>
		<link>http://www.plagiarismtoday.com/2011/12/13/3-count-vegas-odds/</link>
		<comments>http://www.plagiarismtoday.com/2011/12/13/3-count-vegas-odds/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 18:05:12 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Copyright News]]></category>
		<category><![CDATA[bratz]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[dolls]]></category>
		<category><![CDATA[isps]]></category>
		<category><![CDATA[mga]]></category>
		<category><![CDATA[Photography]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[righthaven]]></category>
		<category><![CDATA[sopa]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=12052</guid>
		<description><![CDATA[SOPA gets some tweaks, doesn't please critics, Righthaven may have copyrights auctioned off and MGA wins one of its Bratz-related lawsuits.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2010/07/3count004-trim.png" alt="" title="3count004-trim" class="alignleft size-full wp-image-7303" height="162" width="175"></p>
<p><em>Have any suggestions for the 3 Count? Let me know via Twitter <a href="http://twitter.com/plagiarismtoday">@plagiarismtoday</a>.</em></p>
<h4>1: <a href="http://news.cnet.com/8301-31921_3-57341916-281/new-version-of-sopa-copyright-bill-old-complaints/">New Version of SOPA Copyright Bill, Old Complaints</a></h4>
<p>First off today, Rep. Lamar Smith (R-Texas) has announced that he has made several tweaks to the Stop Online Piracy Act (SOPA) to address &#8220;legitimate concerns&#8221; about the bill. Those changes include narrowing which sites could be targeted by rightsholder lawsuits, restricting which sites would be ordered to remove such sites from their queries and enable the removal of just a portion of a site. However, the heart of the bill remains, including giving copyright holders the ability to order ISPs, payment processors and advertising networks to block or stop doing business with so-called &#8220;Rogue&#8221; sites. As a result, the changes have not done much to appease critics of the bill. The bill is due for a vote in the House Judiciary Committee tomorrow where it widely expected to make it through. </p>
<h4>2: <a href="http://www.vegasinc.com/news/2011/dec/12/righthaven-backed-corner-copyrights-be-auctioned/">Righthaven Backed Into a Corner; Copyrights to be Auctioned</a></h4>
<p>Next up today, Righthaven, fresh off a string of defeats appears to be backed into a corner as a judge appears to be ready to auction off Righthaven&#8217;s copyrights, possibly marking an end to the company&#8217;s litigation campaign. Righthaven had sued some 275 defendants over using content from the Las Vegas Review-Journal and the Denver Post but filed as the copyright holder rather than the law firm representing the companies. They lost several cases based on a lack of standing to sue as it was revealed the newspapers had retained control over the copyright. The Review-Journal did eventually assign control in some of the works to help with Righthaven&#8217;s appeals but, after being ordered to pay legal fees for several defendants and unable to muster up enough cash, Righthaven may have those same copyrights auctioned off, giving it nothing to sue over.</p>
<h4>3: <a href="http://boingboing.net/2011/12/12/inspiration-isnt-infringemen.html">Bratz Copyright Lawsuit Tossed</a></h4>
<p>Finally today, one of the Bratz lawsuits was tossed but it wasn&#8217;t the big one between MGA and Mattel. Instead, it was the one that pitted photographer Bernard Belair against MGA, the makers of the Bratz line. Belair had created a series of advertisements in the late 90s featuring oddly proportioned dolls. A representative for MGA admitted that the ads were inspiration for the Bratz line but a judge ruled that the inspiration did not cross the line into infringement as it didn&#8217;t rise to &#8220;substantial similarity&#8221;. There is no word if Belair plans on appealing.</p>
<h4>Suggestions</h4>
<p>That&#8217;s it for the three count today. We will be back tomorrow with three more copyright links. If you have a link that you want to suggest a link for the column or have any proposals to make it better. Feel free to leave a comment or send me an email. I hope to hear from you. </p>
<h4>Want the Full Story?</h4>
<p>Tune in <a href="http://www.plagairsimtoday.com/podcast">every Wednesday evening at 5 PM ET for the live recording of the Copyright 2.0 Show</a> or wait and get the edited version <a href="http://www.plagiarismtoday.com/category/podcast/">Friday right here on Plagiarism Today</a>. </p>
<p><em>The 3 Count Logo was created by <a rel="nofollow" href="http://www.cloudjunkies.com/">Justin Goff</a> and is licensed under a <a rel="nofollow" href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution License</a>. </em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2011/12/13/3-count-vegas-odds/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DNS, SOPA, Content Blocking and More</title>
		<link>http://www.plagiarismtoday.com/2011/11/15/dns-sopa-content-blocking-and-more/</link>
		<comments>http://www.plagiarismtoday.com/2011/11/15/dns-sopa-content-blocking-and-more/#comments</comments>
		<pubDate>Tue, 15 Nov 2011 21:57:12 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[sopa]]></category>
		<category><![CDATA[stop online piracy act]]></category>
		<category><![CDATA[website blocking]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=11815</guid>
		<description><![CDATA[With all of the talk about the Stop Online Piracy Act, it's important to take a look at the tech behind site blocking and how it might work if passed.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2011/11/dns-tree-image-300x233.jpg" alt="DNS Tree Image" title="DNS Tree Image" width="300" height="233" class="alignleft size-medium wp-image-11816" />If you&#8217;ve been reading about Stop Online Piracy Act (SOPA), Preventing Real Online Threats to Economic Creativity and Theft of Intellectual Property act (PROTECT-IP) or the bill&#8217;s new name Enforcing and Protecting American Rights Against Sites Intent on Theft and Exploitation Act (E-PARASITES) you&#8217;ve likely been hearing a good deal about the site blocking provisions, which would enable rightsholders to get court injunctions to block access to certain websites.</p>
<p>Rather than talk about the bill itself (Note: <a href="http://www.copyhype.com/2011/11/dispatches-from-the-sopacolypse/">Read Terry Hart&#8217;s CopyHype</a> and <a href="http://www.techdirt.com/articles/20111108/00553216676/why-protect-ipsopa-is-exact-wrong-approach-to-dealing-with-infringement-online.shtml">Mike Masnick&#8217;s Techdirt</a> for pro/con analyses on the issue) I wanted to talk about the technology behind it and what it means for copyright holders both large and small. </p>
<p>However, to do that, we first have to take a look at what DNS is, how it works and how, likely, how these bills would work with it.</p>
<p>Then, and only then, can we start to figure out what the likely impact is going to be.<span id="more-11815"></span></p>
<h4>What is DNS?</h4>
<p>DNS stands for Domain Name System and it is a distributed system for converting the domain names we humans understand and the numbers that machines on the Web need.</p>
<p>Basically, every computer on the Web is identified by an IP address. Though multiple machines or sites can have the same IP, for another computer to get to yours it needs the IP address to get there. IP addresses, in the current version, are a series of four numbers from 0 to 255 separated by periods. If you go to <a href="http://www.whatismyip.com/">WhatIsMyIP</a> you&#8217;ll see what your public IP address is.</p>
<p>While this system is great for machines, humans would find it difficult to remember the IP address for every site they wanted to visit. DNS was created to let humans use easy-to-remember domain names (IE: plagiarismtoday.com) and translate those domains into machine-usable IP addresses.</p>
<p>To enable this, there are DNS servers positioned all over the world. These servers function like a telephone directory, converting the domain name into an IP address. When you type in a domain your computer hasn&#8217;t been to recently, your computer queries your designated DNS server, gets the DNS record and visits the site.</p>
<p>Generally, if you haven&#8217;t altered your DNS settings, you are using a DNS server provided by your ISP. That server, in turn, got its information from the root DNS servers, which sit atop of the DNS tree. Every time a change is made to a domain&#8217;s IP, the root servers are informed but it takes a while for that information to trickle down to the user-facing servers who typically cache DNS data for anywhere from 4-72 hours. </p>
<p>This is why, for example, when you move hosts it takes a while to show the changes on your home computer.</p>
<p>The main thing to remember though is that DNS functions like a phone book, listing domains and their IP addresses and converting the two for your computer. Without it, domains don&#8217;t work properly and you have to manually type in IP address information.</p>
<h4>How DNS Blocking Works</h4>
<p>What this has to do with the above bills is fairly simple. According to them, a rightsholder could get a court order that forces DNS providers in the U.S. to block access to a &#8220;rogue&#8221; website.</p>
<p>That would, most likely, work by having DNS providers simple remove the site&#8217;s domain from their list, similar to removing a name from the phonebook. Either that or the line in the database would be changed to point to the incorrect address, similar to changing the number in a phone book to direct callers elsewhere.</p>
<p>Similar blocking techniques are already fairly widely used. On the smallest level, many homes and businesses use similar DNS alteration techniques to block access to malware, pornography and other sites they don&#8217;t want those within visiting. <a href="http://www.opendns.com/">OpenDNS</a>, for example, does this for users for free.</p>
<p>On a grander scale, this kind of DNS filtering is part of many countries&#8217; attempt at filtering the Web and is a key component of the <a href="http://www.tgdaily.com/software-features/49083-great-firewall-of-china-goes-global-in-dns-muddle">Great Firewall of China</a>.</p>
<p>So, while the tech has been used for good and evil in different situations, the more important question right now is &#8220;How effective would it be?&#8221;</p>
<h4>The Effectiveness of DNS Filtering</h4>
<p>Though DNS filtering sounds like a death blow to a website, it is anything but. After all, the site still exists and it does so at the same IP address, it just means a computer using that DNS service can&#8217;t get that IP using the domain. </p>
<p>This gives several options to a site that is blocked, including getting a new domain or directing visitors to use their IP. Users who want to get around such a block can do so easily as well by either switching to a different DNS service, possibly one located in another country, using the IP address directly or using a proxy (connecting to the Web via a third party computer) to bypass the DNS service.</p>
<p>In short, hardcore pirates would not likely be deterred, at least not greatly. A recent court order in the UK to force ISPs to block the site Newzbin 2, a Usenet scraper, <a href="http://www.bbc.co.uk/news/technology-15572495">has met with, at best, with mixed results</a> as the audience of Newzbin 2 is already tech-savvy and less-than-casual in their piracy.</p>
<p>Casual pirates, however, might be more frustrated. Though it&#8217;s trivial to change DNS providers, <a href="http://www.theregister.co.uk/2011/11/09/dns_malware_scam/">a recent malware scam did it on some 4 million PCs without users knowing</a>, it&#8217;s not something that most users know how to do or would feel comfortable doing.</p>
<p>Furthermore, pointing DNS to an untrusted third party can lead to other security risks as they can literally direct any domain to any server at will, raising issues of malware, identity theft and more. This makes the risks and effort higher than the reward for many casual pirates.</p>
<p>A lot of these workarounds could be mitigated by including IP address blocking with the DNS blocking, thus preventing anyone from accessing the IP address directly, but sites can and do change IP addresses regularly without much effort. Also, maintaining such a list would be a much greater challenge and greatly increases the risk of non-infringing sites being blocked as, at times, thousands of sites can share one IP address.</p>
<p>In the end, the question isn&#8217;t if someone will be able to get around these measures, it&#8217;s a matter of how many and what number of hurdles will they have to leap to make it happen.</p>
<h4>What Does This Mean for Me?</h4>
<p>It&#8217;s almost impossible to predict what the bill would mean for the Web if passed. There are predictions on all sides but much of the actual impact would depend on how courts approach the new law and how it is applied. That, as the Digital Millennium Copyright Act (DMCA) has showed, can be almost impossible to predict.</p>
<p>There are predictions on both sides ranging from sites like YouTube being blocked to a much more limited use where only a handful of extreme &#8220;rogue sites&#8221; being blocked. Those in favor of the bill claim it will only target the &#8220;worst of the worst&#8221; while those opposed to it claim it&#8217;s so broad it could impact almost any site.</p>
<p>Both of those predictions are theoretical until the law passes and the courts begin to wrangle with it. </p>
<p>One thing that is certain, however, is that smaller copyright holders probably will see no benefit from it. Even if the bill isn&#8217;t used just on the &#8220;worst of the worst&#8221;, getting a site blocked still requires a court order. Filing suit for copyright infringement is far too expensive for most smaller copyright holders, making it so that they will unlikely be able to use this bill in any way. (Note: Some versions of the bill have it so that rightsholders can force payment processors and advertisers to stop working with infringing sites on a notice-and-takedown basis rather than a court order).</p>
<p>In short, the most likely way a small-to-medium sized rightsholder will see anything from this bill is if they are visiting one of the site to get blocked.</p>
<h4>Bottom Line</h4>
<p>As hinted above, the bills have a lot more to them than just website filtering, they also include provisions for forcing payment processors, advertisers, search engines and other companies to stop doing business with or listing such sites. These provisions, in the long-run, could have a much larger impact on the Web.</p>
<p>One thing that is clear though is that these bills are a hot topic right now and have people lined up on both sides to either support or condemn them.</p>
<p>This makes it all the more important, to me, to understand how the technology would work and analyze what the likely outcomes of the tech would be. </p>
<p>While we can&#8217;t accurately predict how the bill would be applied, we can understand how it would work and that can give at least some insight to the legislation and its importance for the Web.</p>
<p><em>DNS Tree Image By: <a href="http://en.wikipedia.org/wiki/File:Domain_name_space.svg">LionKimbro on Wikipedia</a> &#8211; Public Domain</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2011/11/15/dns-sopa-content-blocking-and-more/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>The Image/File Hosting Problem</title>
		<link>http://www.plagiarismtoday.com/2009/03/25/the-imagefile-hosting-problem/</link>
		<comments>http://www.plagiarismtoday.com/2009/03/25/the-imagefile-hosting-problem/#comments</comments>
		<pubDate>Wed, 25 Mar 2009 18:53:20 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[amazon s3]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[domain tools]]></category>
		<category><![CDATA[image hosting]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[who is hosting this]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=3087</guid>
		<description><![CDATA[There are times where content that appears to be on one server is really elsewhere. Here's how to overcome that problem. ]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://files.plagiarismtoday.com/wp-content/uploads/2009/03/amazon-s3-logo.png" alt="amazon-s3-logo" title="amazon-s3-logo" width="185" height="88" class="alignleft size-full wp-image-3088" /></p>
<p>In 2007 I wrote an article entitled &#8220;<a href="http://www.plagiarismtoday.com/2007/12/20/why-i-embed-my-images/">Why I Embed My Images</a>&#8221; that discussed how embedding images and other can provide greater security when you feel there is a risk someone might file a takedown notice. By separating your images from your server, should someone file a takedown notice over an image, your site will remain active and, with good backups, you can get your site back up more quickly.</p>
<p>It is a way to guard against misuse of the DMCA or fair use disputes.</p>
<p>However, since then I have backed away from that stance. Once I moved to my new VPS, I stopped hosting images remotely as I have a good relationship with my host and have no reasons to worry. That being said, in an effort to improve the efficiency of the site, I&#8217;ve also started toying with <a href="http://aws.amazon.com/s3/">Amazon S3</a> to see if it can help improve the site&#8217;s speed (the images in this post will be hosted on S3 as part of the test).</p>
<p>It was at this point that I realized a problem. If I were malicious in my use of S3, or any similar service, it could be used as a method not to prevent complete site failure, but to avoid a DMCA altogether. It is possible, using these services, to trick users into filing complaints with the wrong hosts, delaying or even preventing anything from being done.</p>
<p>I immediately, using my own site as a test subject, began to seek a way around it and, fortunately, found a way to ensure that, no matter where a file is hosted, you&#8217;ll always be able to track down the host with reasonable accuracy.<span id="more-3087"></span></p>
<h4>The Nature of the Problem</h4>
<p>If you right click on the images in this post and view their URL, you&#8217;ll see that they are hosted on a subdomain of Plagiarism Today named &#8220;files.plagiarismtoday.com&#8221;. This makes it appear, including to many automated tools, that the content is hosted on the same server as the rest of the site. The problem is that they are hosted on Amazon S3, clear across the country.</p>
<p>This trick is fairly trivial to do and <a href="http://www.labnol.org/internet/host-images-files-on-amazon-s3-storage/4923/">only involves a minor tweak to DNS</a>. There are many legitimate reasons for doing it, for example, hosting images on your domain while using a content delivery network to increase speed.</p>
<p>However, if a copyright holder decided one of these images were infringing, filing a DMCA notice would be difficult. The reason is that since the files are on a subdomain of plagiarismtoday.com most will assume it&#8217;s located on my server and act accordingly. This is due to a fluke in both the way we read URLs, where we routinely ignore subdomains, and the way networking tools routinely discard subdomain information.</p>
<p>Some copyright holders, especially those less familiar with DNS and networking, might not consider this and could inadvertently file a DMCA notice or other abuse complaint with the wrong host. This can result in a delay in getting a complaint resolved, in it being outright ignored or even causing it to be handled in a questionable way.</p>
<p>The good news is that there is a simple way around it and, as long as you are careful about how you gather your information, there is no need to make this mistake.</p>
<h4>Dealing with Linked Files</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://files.plagiarismtoday.com/wp-content/uploads/2009/03/wiht-logo-1-300x65.png" alt="wiht-logo-1" title="wiht-logo-1" width="300" height="65" class="alignright size-medium wp-image-3092" /></p>
<p>When you&#8217;re dealing with an image file or any content that is linked into a Web page (not part of the actual HTML) it is important to make sure that you get the correct information about where that particular file is hosted, not just the page that it is on.</p>
<p>The solution is pretty simple:</p>
<ol>
<li><strong>Get the URL of the File:</strong> Rather than copying the URL of the page, right click the image or the link and copy the URL. Check and see if it is on the same site, a subdomain or another domain altogether.</li>
<li><strong>Use Who Is Hosting This:</strong> Once you have the URL, delete the &#8220;http://&#8221; as well as everything including and after the first remaining &#8220;/&#8221; and process it through <a href="http://www.whoishostingthis.com">Who is Hosting This</a>. Who Is Hosting This handles subdomains correctly, unlike Domain Tools, which strips out subdomain information in my testing.</li>
<li><strong>Confirm the Results:</strong> You can then confirm the results by copying the IP address (you&#8217;ll have to actually copy the numbers on the site, not using the link) and then running it through <a href="http://domaintools.com">Domain Tools</a>. Once you&#8217;ve done that, you can then go forward and begin the work of finding the DMCA or abuse agent and contacting them.</li>
</ol>
<p>Though this adds a few extra steps to the process, it is worth doing to ensure that you contact the correct party as doing so is the only way to guarantee the quickest and most reliable resolution.</p>
<h4>Why This is Important</h4>
<p>The reason that this is critical is because sending a DMCA notice to the wrong host, at the very least, will greatly slow down the process as the host has to research and figure out what is going on and then decide if they going to A) Disable the page anyway B) Forward the notice on or C) Do nothing.</p>
<p>Since the company that hosts the Web site does not host the image, their role under the DMCA is much less clear. <a href="http://www.copyright.gov/title17/92chap5.html#512">Section 512(c)</a>, which usually deals with Web hosts and takedowns, only pertains to &#8220;the storage at the direction of a user of material that resides on a system or network controlled or operated by or for the service provider&#8221;. Since there is no storage, a regular DMCA notice doesn&#8217;t apply.</p>
<p><a href="http://www.copyright.gov/title17/92chap5.html#512">Section 512(d)</a> does pertain to &#8220;information location tools&#8221; but in that case, it would be the site owner, not the host that is party for the notice. This section deals with sites, such as Google, that are &#8220;referring or linking users to an online location containing infringing material or infringing activity&#8221;. Since the host isn&#8217;t the one linking to the file, it is the user, the application of 512(d) doesn&#8217;t make as much sense.</p>
<p>This isn&#8217;t to say that hosts won&#8217;t deactivate sites or remove pages if the content is embedded or hyperlinked, especially if the site is spammy in nature or has other abuse issues, but the fastest way to secure removal of images or other media files is to go to the source. </p>
<p>It can be a bit tedious to do, but it is well worth the time.</p>
<h4>Bottom Line</h4>
<p>The simple truth is that the days of all of the content on a site being hosted on the same server have long since passed. Content embedding from photo sharing sites, video sites and elsewhere have made it much more difficult easily track down where a particular item is hosted.</p>
<p>Though sometimes, as with YouTube clips, where the content is hosted is obvious, other times, as with image hosts, it is much less clear. </p>
<p>Unless you are dealing with textual works, which are almost never embedded (unless you use a service such as <a href="http://www.thenewsroom.com/">Voxant Newsroom</a> that embeds text via Flash and JavaScript), this is something you have to constantly watch out for.</p>
<p>Dealing with content theft issues is not difficult, but it does require a bit of detective work. However, knowing the challenges you face and the tools that can help you overcome them can keep the sleuthing required to a minimum. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2009/03/25/the-imagefile-hosting-problem/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Quarkbase: Almost Everything About a Website</title>
		<link>http://www.plagiarismtoday.com/2008/12/17/quarkbase-almost-everything-about-a-website/</link>
		<comments>http://www.plagiarismtoday.com/2008/12/17/quarkbase-almost-everything-about-a-website/#comments</comments>
		<pubDate>Wed, 17 Dec 2008 17:13:28 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[domain tools]]></category>
		<category><![CDATA[Hosting]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[quarkbase]]></category>
		<category><![CDATA[whois alexa]]></category>
		<category><![CDATA[whoishostingthis]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=2304</guid>
		<description><![CDATA[Hot on the heels of my review of AbouThiSite, we now take a look at a new service that promises to improve the way you get information about a site and fix many of the issues from the former review.]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/12/quarkbase-logo-300x96.png" alt="quarkbase-logo" title="quarkbase-logo" width="300" height="96" class="alignleft size-medium wp-image-2307" />Last week <a href="http://www.plagiarismtoday.com/2008/12/11/abouthisite-smart-weasel-useless-data/">I covered AbouThiSite</a>, a mashup that was designed to take a domain and give the user a variety of information on it including traffic estimations, PageRank and more.</p>
<p>But while AbouThiSite was interesting, its data is woefully incomplete, forcing me to continue relying on other sites for most of my information. </p>
<p>However, long-time reader <a href="http://voyagerfan5761.blogspot.com/">Voyagerfan5761</a> alerted me to a similar, though more complete, service that had flown under my radar. The service, Quarkbase, promises to provide &#8220;Everything About a Site&#8221; and to be everything I wanted AbouThiSite to be. </p>
<p>So, I excitedly gave the site a whirl and learned quickly that it is a huge step in the right direction, but not quite the endgame I was hoping for.<span id="more-2304"></span></p>
<h4>What It Does</h2>
<p>Quarkbase works very similarly to other sites in this field. Users either punch in the domain they are interested in or <a href="http://www.quarkbase.com/tools">use their bookmarklet</a>, to pull up information from dozens of resources about the domain a site is hosted on. </p>
<p>That information is then broken up into seven different categories, which can either be scrolled through on the default &#8220;All&#8221; page or quickly selected via the tabs at the top. Those sections are as follow:</p>
<ol>
<li><strong>Introduction:</strong> This includes basic information about the site including its name, common tags, contact information, logo and slogan.</li>
<li><strong>Social Popularity:</strong> This analyzes how well the site has performed on various social sites including Digg, Reddit, Delicious, etc. Also pulls the subscriber account from FeedBurner if possible.</li>
<li><strong>Traffic:</strong> This pulls in a variety of traffic estimates from Alexa. Though notoriously unreliable, it does provide clues as to which countries most frequently visit a site and a general idea of popularity.</li>
<li><strong>People:</strong> Takes a best guess at the person or people that run the site. It is not clear where this information comes from.</li>
<li><strong>Spotlight:</strong> This section attempts to glean who is &#8220;talking&#8221; about a domain, specifically by looking at Twitter. Though flawed in that it can&#8217;t parse TinyURLs, which are heavily used on the service, it still works surpringly well.</li>
<li><strong>Company:</strong> Only shows up on reports of sites owned by a company. Offers company profile information and job postings for the company that owns the site.</li>
<li><strong>Technical:</strong> This is the &#8220;meat&#8221; of the site&#8217;s information from an abuse standpoint, providing information on who is hosting the site, what the nameservers are and the location of the server.</li>
</ol>
<p>Quarkbase is able to do this by bringing together information from a variety of sources including <a href="http://www.alexa.com">Alexa</a>, <a href="http://www.zoominfo.com/">Zoominfo</a> and more. </p>
<p>The result of all of this information is that the results page is extremely large and, at times, slow loading. The initial page is also very cluttered, though the tab feature makes it much easier to cut to what you need.</p>
<p>All in all, the information that Quarkbase provides is very robust and very simple to use, however, there are a few hiccups that prevent me from making this service my default.</p>
<h4>Small Roadblocks</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/12/quarkbase-error-300x172.png" alt="quarkbase-error" title="quarkbase-error" width="300" height="172" class="alignright size-medium wp-image-2309" />The biggest issue I have with Quarkbase is that the information, in particular in the Technical section, is questionable at best. For example, <a href="http://www.quarkbase.com/show/hostgator.com">when looking up Hostgator&#8217;s main site</a>, the site doesn&#8217;t even fathom a guess as to the hosting provider. It correctly guesses that the ISP is &#8220;THEPLANET.COM INTERNET SERVICES&#8221; but even that is not completely accurate as Hostgator is its own host, just using ThePlanet&#8217;s servers.</p>
<p>This was an issue that has tripped up other services, including Domain Tools,  and <a href="http://www.plagiarismtoday.com/2008/11/25/whoishostingthis-easy-and-reliable/">prompted a fix from WhoIsHostingThis</a>. During many of my tests, Quarkbase refused to even guess about the host information, instead just leaving that line blank, and the ISP information was dubious at best.</p>
<p>Other information on the service was unreliable as well. <a href="http://paulstamatiou.com/2007/03/07/why-you-should-completely-ignore-alexa-stats">Alexa&#8217;s traffic data is notoriously unreliable</a>, though still an understandable choice under the circumstances. Also, information in the &#8220;Introduction&#8221; section is routinely either left blank or inaccurate, especially the contact information on non-company sites. </p>
<p>This limits the usability of the service, especially when competing sites such as WhoIsHostingThis have largely overcome many of the same challenges, but it still remains one of the most complete overviews of a site or domain that you can get, even with the hiccups and speedbumps.</p>
<h4>Conclusions</h4>
<p>Quarkbase is far from perfect, but what it does it does well. Though its results are not as technically-oriented as <a href="http://www.domaintools.com">Domain Tools</a> or as accurate as WhoIsHostingThis, its sheer breadth of data makes it great for a &#8220;quick overview&#8221; of a site&#8217;s information.</p>
<p>Though I again don&#8217;t think Webmasters and bloggers will get a lot of use from this tool when chasing down scrapers and plagiarists, especially since the site does not do subdomains at this time, it could provide some assistance with directly contacting infringers, locating the host and learning about the background of a site before moving in.</p>
<p>If nothing else, it might be worth seeing what Quarkbase turns up on an infringing domain just to see if there is anything you were unaware and can use, such as an email address or contact form.</p>
<p>I can pretty much promise you that you will learn something about every site you punch into this service, the question is how accurate and useful will that information be. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/12/17/quarkbase-almost-everything-about-a-website/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>AbouThiSite: Smart Weasel, Useless Data</title>
		<link>http://www.plagiarismtoday.com/2008/12/11/abouthisite-smart-weasel-useless-data/</link>
		<comments>http://www.plagiarismtoday.com/2008/12/11/abouthisite-smart-weasel-useless-data/#comments</comments>
		<pubDate>Thu, 11 Dec 2008 14:48:58 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[abouthisite]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[domain tools]]></category>
		<category><![CDATA[mashup]]></category>
		<category><![CDATA[mashups]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Whois]]></category>
		<category><![CDATA[whoishostingthis]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=2259</guid>
		<description><![CDATA[Mashup site, AbouThiSite, attempts to make the process of getting the information you need about a domain easier than ever, but does it provide the needed tools for us to get by?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/12/abouthissite-logo-300x70.png" alt="abouthissite-logo" title="abouthissite-logo" width="300" height="70" class="alignleft size-medium wp-image-2258" />Mashups, like any other technology, have the potential to be used for bad and for good. They can hurt Webmasters when done incorrectly or help them when done right.</p>
<p><a href="http://www.abouthisite.com/">AbouThiSite</a> attempts to be one of latter kind of mashups, providing valuable information about a target domain at the click of a button. </p>
<p>But how useful is it in the real world? The answer, sadly, is not very much. It won&#8217;t be a part of my arsenal, not unless it adds some additional data. Still, there is much that can be gleaned from it, if others are willing to listen.<span id="more-2259"></span></p>
<h4>What it Does</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/12/abouthissite-sidebar.png" alt="abouthissite-sidebar" title="abouthissite-sidebar" width="286" height="199" class="alignright size-full wp-image-2260" />The idea behind AbouThiSite is very similar to <a href="http://whoishostingthis.com">WhoIsHostingThis</a> and <a href="http://domaintools.com">Domain Tools</a> in that you punch in a domain and receive back vital information on it. But where Domain Tools is targeted at those who are familiar with networking tools and WhoIsHostingThis focuses on making the process of finding a site&#8217;s host simple, AbouThiSite attempts to provide a different set of information in an easy to approach manner.</p>
<p>This includes the following:</p>
<ol>
<li>The IP Address</li>
<li>Other Sites Likely on the Same Server</li>
<li>The location of where the site is hosted.</li>
<li>The PageRank/Traffic of the Site</li>
<li>Information About the Colors and HTML of the Site</li>
</ol>
<p>This information is then displayed in a colorful and easy-to read page that includes a Google Map of the estimated server location, a preview of the site and a link to subscribe to the site&#8217;s RSS feed, if it has one.</p>
<p>It is indeed incredibly easy to use, but, for Webmasters dealing with content theft or abuse issues, it is a fairly useless service. In fact, outside of some limited SEO purposes, I have a very difficult time imagining why anyone would favor AbouThiSite over other sites.</p>
<h4>Missing Details</h4>
<p>The most useful aspect of AbouThiSite is the SEO elements. Having the PageRank, rough traffic and IP information in one place is useful. Though the traffic stats seem to underestimate every site I punched in, the relationships between them made sense.</p>
<p>However, you can get most of this information elsewhere, the only advantage with AbouThiSite being that the information is very cleanly laid out and easy to read. Whether that is worth the trip is up to each Webmaster to decide.</p>
<p>For those wanting to deal with abuse issues, this site is missing critical information that one needs including Whois information, information about the actual host of the site (other than its location) and provides no easy means to obtain it.</p>
<p>Since all of the useful information can easily be procured off another site and you will have to go there regardless to get the information you need, there is little reason to make AbouThiSite a stop at all.</p>
<h4>Lessons</h4>
<p>This isn&#8217;t to say that AbouThiSite is a bad tool, just that it doesn&#8217;t fill any needs that I have. There is still a great deal it does right and others may find it useful.</p>
<p>What I sincerely hope is that other sites, especially Domain Tools, might take a look at the way AbouThiSite displays information and glean a few lessons from it, namely how to put a lot of information about a site in front of viewers in a clean, attractive manner.</p>
<p>Though appearance is definitely not everything when looking for tools to help you get the information you need, it does count, as WhoIsHostingThis has showed us. The easier a site is to read, the quicker we get the information.</p>
<p>Likewise, though WhoIsHostingThis is laser-focused and clean to use, it could also benefit from some additional information, such as the location of the host (at least the country) and, perhaps, the whois data.</p>
<p>The bottom line though is that AbouThiSite offers a glimpse of what a good domain information mashup could be without actually being that mashup.</p>
<h4>Conclusions</h4>
<p>The bottom line is that, if AbouThiSite has information that you find useful, then by all means use it. It&#8217;s a fast, user-friendly site that is easy to pick up and add to your toolbox. I, personally, don&#8217;t have much  use for it nor do I see how others might.</p>
<p>That being said, it appears to me that the site is more of a proof of concept than a finished product and, with that in mind, the concept it does show is valuable.</p>
<p>The Web may not have a lot of use for this site, but there is a lot it could learn from it, if one is willing to listen. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/12/11/abouthisite-smart-weasel-useless-data/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>WhoIsHostingThis: Easy and Reliable</title>
		<link>http://www.plagiarismtoday.com/2008/11/25/whoishostingthis-easy-and-reliable/</link>
		<comments>http://www.plagiarismtoday.com/2008/11/25/whoishostingthis-easy-and-reliable/#comments</comments>
		<pubDate>Tue, 25 Nov 2008 16:53:45 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[abuse]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[Hosting]]></category>
		<category><![CDATA[networking]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Whois]]></category>
		<category><![CDATA[whoishostingthis]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/?p=2171</guid>
		<description><![CDATA[Recent improvements at WhoIsHostingThis promise to make it the go-to resource for finding the host of a site. But are the improvements good enough?]]></description>
			<content:encoded><![CDATA[<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/11/whoishostingthis-logo.png" alt="" title="whoishostingthis-logo" width="310" height="77" class="alignleft size-full wp-image-2172" />One of the hardest parts of dealing with spam, copyright infringement or other abuse issues on the Web is finding out who to report it to. To do that, typically one has to determine who is hosting the site and, though it is relatively simple with sites such as Myspace and Facebook, it gets far more complicated when dealing with blogs or sites that have their own domain names.</p>
<p>The techniques for determining who a host is are, at best, complicated and somewhat geeky in nature. Though <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/3-finding-the-host/">I wrote a guide on how to use some tools for finding the host</a>, the process remains one of the most common questions I get asked about. </p>
<p>At least one site, WhoIsHostingThis, has attempted to simplify this process. Turning into a Google-style experience. <a href="http://www.plagiarismtoday.com/2008/02/22/useful-site-who-is-hosting-this/">Previously reported on here</a> the site did a respectable job in most cases, though there were some peculiar results on some tests.</p>
<p>The idea is that the networking wizardry should be hidden from the user and the site should receive a domain (or bookmarklet click) and then simply return the host. A great theory, especially for the non-tech oriented, but due to the nature of the work it is not always reliable. Most who are familiar with the tools, myself included, tended to lean on more sophisticated sites, such as <a href="http://www.domaintools.com">DomainTools</a>. </p>
<p>However, an upgrade at WhoIsHostingThis is attempting to change that, by fixing the kinks and bugs and, potentially, making the site a one-stop shop for domain hosting and information.<span id="more-2171"></span></p>
<h4>Some Geek Stuff</h4>
<p>The typical way to determine the host of a site is a tool called IP Whois. Basically, IP Whois works like this:</p>
<ol>
<li>All servers on the Web (as well as all computers or routers facing the Web) resolve to an IP address, a set of four numbers from 0-255.</li>
<li>Those IP addresses are controlled and doled out by various Regional Internet Registries (RIRs) that are non-profit oversight boards that help control these limited resources. <a href="http://www.arin.net">ARIN</a> is the RIR for the United States and North America. </li>
<li>When RIRs assign IP addresses, they keep a registry of who is assigned what numbers. That information can be queried by an IP Whois.</li>
<li>The most common purchasers of IP addresses are Web hosts, such as GoDaddy, ISPs, such a your cable company, and academic institutions.</li>
<li>These institutions then allow their customers to use the IP address for accessing the Internet, hosting a site, etc. but usually do so only on their own network. Most of the time an IP address purchased by company X will point to a customer of their company.</li>
<li>Thus, an IP Whois can usually trace you back to who is hosting a particular site or at least who is responsible for the IP address at that particular location.</li>
</ol>
<p>The procedure is far from perfect and, as we&#8217;ll explore there are ways it can be gamed. But it is far more accurate than other methods, such as looking at the DNS servers, which can be trivially changed by spammers and plagiarists.</p>
<p>It is also this method that has been largely utilized by WhoIsHostingThis with great results. However, where the site has struggled has been with exceptions to the rule, cases where the IP Whois is misleading or, worse still, downright wrong. </p>
<p>Though these are cases that can usually be corrected with other tools, such as traceroutes (which look at the path traffic takes to arrive at the destination) or the DNS information, that information has, traditionally, not been used by WhoIsHostingThis.</p>
<p>That is starting to change. </p>
<h4>The &#8220;HostGator Problem&#8221;</h4>
<p><img style=' float: right; padding: 4px; margin: 0 0 2px 7px;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/11/hostgator_logo.gif" alt="" title="hostgator_logo" width="293" height="83" class="alignright size-full wp-image-2174" />In March of 2007, one of the largest moves in Web hosting took place as HostGator, the very popular budget Web host, <a href="http://news.clickfire.com/hostgator-the-planet-join-forces/204/">moved much of its 500,000 plus domains into ThePlanet&#8217;s datacenter</a>. Though the move made sense for both parties, it created an abuse reporting kludge that remains.</p>
<p>The problem is this, on those half million domains, the IP Whois information points to The Planet and not Hostgator since they are located within The Planet&#8217;s network. Thus many, myself included, have sent DMCA notices or spam reports to The Planet thinking that they were the host. This has created slow downs in addressing critical issues.</p>
<p>However, these problems are largely avoidable as the DNS servers, as well as other information, do point to HostGator as the host. The problem is that the information can be easily overlooked.</p>
<p>So, while this problem can be overcome by humans, it requires a fair amount of skill at reading networking and domain information and, even then, is prone to mistakes. WhoIsHostingThis is seeking to fix that problem by looking at multiple sources of information, including the DNS information, to determine who the host is. </p>
<p><img style=' float: left; padding: 4px; margin: 0 7px 2px 0;'  src="http://www.plagiarismtoday.com/wp-content/uploads/2008/11/whoishostingthis-hostgator-300x58.png" alt="" title="whoishostingthis-hostgator" width="300" height="58" class="alignleft size-medium wp-image-2175" />In that regard it has already &#8220;fixed&#8221; the Hostgator problem, a search on the site for a HostGator domain <a href="http://www.whoishostingthis.com/hostgator.com">reveals HostGator as the host</a>, not The Planet. A similar result happens when you look for WordPress.com domains, as it <a href="http://www.whoishostingthis.com/wordpress.com">shows WordPress as the host, not Layered Technologies</a>.</p>
<p>Though the site provides the additional information below the main result, in case the results are mistaken, it is right in these cases. </p>
<h4>Further Improvements</h4>
<p>Though WhoIsHostingThis has already integrated many of the hosts that, like HostGator, have their IP addresses listed as being another service, this is not to say that they have all of them. The operators of the site admit that the site needs further improvements.</p>
<p>However, where the site was previously about 95% accurate with its information, it is now most likely well over 99%. These cases where the IP Whois was wrong were rare to begin with and the site has already fixed most of the larger outliers. This means that only a fraction of a fraction of domains should return any issues.</p>
<p>That being said, there are still issues and bugs to be worked out. For one, where the site does very well with U.S. and Canada-based hosts, international ones, especially those in languages other than English, seem to give the site trouble from time to time. Also, there are still at least some cases where the information might be technically correct, but does not provide a correct URL for the host or enough information to locate it.</p>
<p>However, as I said earlier, these are extreme outliers. For most cases, WhoIsHostingThis works very well and certainly good enough for those that don&#8217;t have the technical expertise to use traditional networking tools.</p>
<h4>Conclusions</h4>
<p>Personally, I&#8217;ve begun using the WhoIsHostingThis bookmarklet to help me determine the host of sites and only using DomainTools or other sites whenever I get a strange result. It&#8217;s worked very well these past few weeks (since the updates began) and I&#8217;ve been impressed with the work that they have done.</p>
<p>Though I&#8217;m never likely to use this site, or any other site, as my exclusive resource for this kind of information (best to have confirmation no matter what you use), the improvements at WhoIsHostingThis have really impressed me. </p>
<p>While there is clearly work to be done, the progress is clearly evident and I am very happy with the improvements they have been making. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2008/11/25/whoishostingthis-easy-and-reliable/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Video: Finding the Host</title>
		<link>http://www.plagiarismtoday.com/2007/11/30/video-finding-the-host/</link>
		<comments>http://www.plagiarismtoday.com/2007/11/30/video-finding-the-host/#comments</comments>
		<pubDate>Fri, 30 Nov 2007 20:28:35 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[DMCA]]></category>
		<category><![CDATA[Products]]></category>
		<category><![CDATA[Videos]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[domain tools]]></category>
		<category><![CDATA[Domains]]></category>
		<category><![CDATA[Host]]></category>
		<category><![CDATA[Hosting]]></category>
		<category><![CDATA[IP]]></category>
		<category><![CDATA[IP Whois]]></category>
		<category><![CDATA[Plagiarism]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/11/30/video-finding-the-host/</guid>
		<description><![CDATA[As a follow-up to my recent updates to the Finding the Host page of the site, I&#8217;ve decided to create a short video demonstrating the process of finding a host for a domain Web site. The video, which lasts about five minutes, takes the user through the process of finding the host for this site,...]]></description>
			<content:encoded><![CDATA[<p>As a follow-up to my <a href="http://www.plagiarismtoday.com/2007/11/27/updates-to-stopping-internet-plagiarism-series/">recent updates</a> to the <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/3-finding-the-host/">Finding the Host page of the site</a>, I&#8217;ve decided to create a <a href="http://revver.com/watch/505351/">short video demonstrating the process of finding a host for a domain Web site</a>. </p>
<p>The video, which lasts about five minutes, takes the user through the process of finding the host for this site, instructing on the use of the <a href="http://www.domaintools.com">Domain Tools</a> Web site and various features of it.</p>
<p>The video is my first attempt at such a screencast so it is far from perfect. There also seemed to be a minor encoding issue when I uploaded the video to Revver. I might try a different video sharing site in the future. </p>
<p>Please let me know what you think of the video. I&#8217;ve embedded it below.</p>
<p><span id="more-739"></span><br />
<script src="http://flash.revver.com/player/1.0/player.js?mediaId:505351;affiliateId:118651;width:480;height:392" type="text/javascript"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/11/30/video-finding-the-host/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Updates to &#8220;Stopping Internet Plagiarism&#8221; Series</title>
		<link>http://www.plagiarismtoday.com/2007/11/27/updates-to-stopping-internet-plagiarism-series/</link>
		<comments>http://www.plagiarismtoday.com/2007/11/27/updates-to-stopping-internet-plagiarism-series/#comments</comments>
		<pubDate>Tue, 27 Nov 2007 21:00:21 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Housekeeping]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[Copyright-Law]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[domain tools]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[guides]]></category>
		<category><![CDATA[IP]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/11/27/updates-to-stopping-internet-plagiarism-series/</guid>
		<description><![CDATA[I&#8217;ve noted for some time that the &#8220;Stopping Internet Plagiarism&#8221; series on the site has fallen into grave disrepair. For example, the instructions for finding the host referenced a service that has not been operation for almost a year and offered complicated instructions when easier tools were available. So, I&#8217;ve started taking some time to...]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve noted for some time that the &#8220;<a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/">Stopping Internet Plagiarism</a>&#8221; series on the site has fallen into grave disrepair. For example, the instructions for finding the host referenced a service that has not been operation for almost a year and offered complicated instructions when easier tools were available.</p>
<p>So, I&#8217;ve started taking some time to rewrite and redraft this portion of the site and I&#8217;ve started with the two sections most sorely in need of updating.</p>
<p><span id="more-736"></span><strong>New and Improved!</strong></p>
<p>The first section to get an overhaul was the first chapter itself, <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/1-how-to-find-plagiarism/">How to Find Plagiarism</a>. To that chapter, I&#8217;ve added two sections, one targeted at bloggers and RSS scraping as well as a section targeted at videographers.</p>
<p>Since spam blogging and video sharing sites were relatively new concepts when the draft was first completed almost three years ago, those sections were not included. However, the article also underwent something of a rewrite in the Non-blogging author section as well as other tweaks to the entire piece.</p>
<p>However, it was the third chapter, <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/3-finding-the-host/">Finding the Host</a>, that has undergone the biggest revision. The previous version was practically unusable. <a href="http://www.samspade.org">Sam Spade</a>, the tool of choice when it was first penned, has been defunct for some time and the tips for determining the host of free Web sites was poorly written and hard to understand.</p>
<p>I&#8217;ve updated the page to use <a href="http://www.domaintools.com">Domain Tools</a> instead of Sam Spade and provide a much more detailed set of instructions, including screenshots.</p>
<p>This should bring that page into the modern times and and allow newcomers to the site to effectively use it.</p>
<p>Since editing these files and updating them is a surprisingly time-consuming process. I&#8217;m going to be doing this update over a period of a few weeks. If things go according to plan, I should have the entire series, along with other static pages on the site, updated by the end of the year or very early next.</p>
<p>Please let me know if you find any problems with them. I&#8217;m going to do my best not to let these pages wait so long before another update and they will be under more constant revision from now on. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/11/27/updates-to-stopping-internet-plagiarism-series/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Whois Service Comes Under Fire</title>
		<link>http://www.plagiarismtoday.com/2007/10/30/whois-service-comes-under-fire/</link>
		<comments>http://www.plagiarismtoday.com/2007/10/30/whois-service-comes-under-fire/#comments</comments>
		<pubDate>Tue, 30 Oct 2007 21:43:44 +0000</pubDate>
		<dc:creator>Jonathan Bailey</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Legal Issues]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Content-Theft]]></category>
		<category><![CDATA[Copyright-Infringement]]></category>
		<category><![CDATA[DNS]]></category>
		<category><![CDATA[Domain Names]]></category>
		<category><![CDATA[domain registration]]></category>
		<category><![CDATA[ICANN]]></category>
		<category><![CDATA[IP]]></category>
		<category><![CDATA[Plagiarism]]></category>
		<category><![CDATA[Registrars]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Whois]]></category>

		<guid isPermaLink="false">http://www.plagiarismtoday.com/2007/10/30/whois-service-comes-under-fire/</guid>
		<description><![CDATA[Article Updated According to a recent AP article, the Whois service, a series of databases with information about the individuals that register domains, has come under fire from privacy advocates and a new proposal seeks to do away with the service altogether. Such a move would be a tremendous blow to law enforcement, lawyers and...]]></description>
			<content:encoded><![CDATA[<p><strong>Article Updated</strong></p>
<p><a href="http://www.divshare.com/download/2572628-d78"><img align="left" hspace="5" src="http://www.divshare.com/img/2572628-d78.png" border="0" /></a>According to a recent AP article, the Whois service, a series of databases with information about the individuals that register domains, has come under fire from privacy advocates and a new proposal seeks to do away with the service altogether.</p>
<p>Such a move would be a tremendous blow to law enforcement, lawyers and researchers that regularly use the database. However, it may also alleviate some of the spam and privacy concerns that come with the database in its current format.</p>
<p>No matter what is decided this Wednesday when a committee from the Internet Corporation for Assigned Names and Numbers (ICANN) meets, this will be a major issue to follow and one that will have a major impact on both the structure of the Internet and how Webmasters protect their content.</p>
<p><span id="more-720"></span><strong>What Is Whois</strong></p>
<p>When someone registers a domain such a .com or a .net, they provide a set of information including their name, email, address and phone number. This information is placed into a Whois database, which each domain registrar keeps, and is then searchable by using a Whois lookup tool, such as the one found on <a href="http://www.domaintools.com">Domain Tools</a>. </p>
<p>This means that, if you register a domain, anyone can look up the information that you provided for it. If you gave your personal information, that could include your home phone and address. </p>
<p>To alleviate these privacy concerns, Whois protection services such as <a href="http://domainsbyproxy.com/">Domains By Proxy</a> have sprung up to keep Whois information secret. These services work as forwarders, letting you use their address as your own and forwarding you email sent to the anonymous account they created for you. They can also be compelled to give up your actual registration info in certain events, including legal matters.</p>
<p>However, such steps have done little to alleviate the concerns of some privacy advocates. They worry about the requirement of posting personal information in a public space in order to obtain a domain and want better protection for Webmasters.</p>
<p>To make matters even worse, spammers have seized on the service, using it to harvest thousands of addresses and send out bulk mail, often relating to domain names. </p>
<p>On the other hand, law enforcement uses Whois to track down criminals. Consumer watchdogs use it to spot scammers. Also, lawyers use it to locate copyright and trademark infringers and anti-spam groups use it to track and monitor spammers.</p>
<p>In short, it is a very powerful tool with both great limitations and serious drawbacks.</p>
<p><strong>Fixing Whois</strong></p>
<p>The problems with Whois are well-documented. </p>
<p>Beyond that above mentioned privacy issues, it is too easy to supply false information to the database. There is nothing to stop a scammer from just putting garbage into his Whois information and avoid detection.</p>
<p>You can <a href="http://wdprs.internic.net/">report invalid Whois information</a> but action is unlikely and slow. It is also, generally, limited to the revoking of the domain and does little to actually identify the person behind the site.</p>
<p>This has resulted in a great deal of inaccurate information in the Whois databases and that, in turn, has limited the usefulness of the tool. Because of that, ICAAN has started looking at ways to improve the tool and the organization is actively looking at proposals for fixing it.</p>
<p>However, most of the proposals are much more mild than the &#8220;sunset&#8221; proposal that has caused such a stir and would do away with Whois by the end of 2008. According to the AP article, a proposal encouraging more study of Whois abuse and the extent of personal registration is much more likely to pass.</p>
<p>But no matter what happens tomorrow when the committee meets, what is clear is that this will be an issue to watch and follow as the outcome of what is decided could drastically change the Web and how we identify the people behind the sites.</p>
<p><strong>What it Means to Us</strong></p>
<p>The good news for Webmasters dealing with copyright issues is that the Whois service is not the tool to use when locating the host of a site. <a href="http://www.plagiarismtoday.com/stopping-internet-plagiarism/3-finding-the-host/">Other tools</a>, such as DNS and IP Whois, are much more valuable in gleaning that information. </p>
<p>However, while that is great for copyright holders that favor contacting hosts and sending DMCA notices, such as myself, it doesn&#8217;t bode well for those who prefer to contact the infringer directly. In many cases, the Whois database is the only source for that information as it is not always posted on the site itself.</p>
<p>If the Whois service does disappear, I expect the following things to happen in the realm of content theft.</p>
<ol>
<li><strong>DMCA Notices Will Become Much More Popular:</strong> Expect more and more Webmasters to turn to DMCA notices and other host contacts as the methods of contacting infringers directly become more limited.</li>
<li><strong>More Subpoenas:</strong> Though the information would not be a in public database, the registrar would still have all of the pertinent information, especially if the individual paid with a credit card. Lawyers might not simply be able to look up who owns a domain, but they certainly could subpoena the information if needed.</li>
<li><strong>Less Enforcement By Novice Webmasters:</strong> The Whois service is popular for tracking down plagiarists because it is very simple to use. You punch in a domain and out comes the information associated with it. Novice Webmasters often lack the knowledge to use more advanced networking tools and, without the Whois database, would have almost no recourse.</li>
</ol>
<p>In short, veterans of dealing with content theft will likely barely notice the disappearance of the Whois service but less experienced Webmasters may find themselves lost without it.</p>
<p><strong>Conclusions</strong></p>
<p>The Whois service is a powerful tool that has some very large problems and raises some very serious concerns. There is little doubt that the service is both riddled with problems and has attracted unwanted attention from the dark side of the net. </p>
<p>However, that does not mean that one can simply ignore all of the good that it does and the potential uses for the Whois service. Proposing to kill off Whois because of its flaws is an extreme overreaction, especially when the depth of these flaws and the possibilities for remedy are not fully understood.</p>
<p>In the end, I have to agree that further study of the problem is needed before anything is decided, especially something as drastic as eliminating the service altogether. The problems that face Whois are great but trying to shut it down is just another non-answer, a way to avoid dealing with the issues.</p>
<p>Whois needs an overhaul, that much is very clear, but shutting it down and walking away not only does more harm than good, but fails to address the issues adequately and, instead, just shuffles them onto the individual registrars.</p>
<p>We, the Internet-using public, entrust ICANN to deal with the tough issues and not run away from them. Shutting down Whois, at this phase, would be exactly that.</p>
<p>We can only hope that ICANN will see this and do what is right, take a more measured approach to the problem and learn more about the issues at hand before leaping off the veritable cliff. </p>
<p>Related: <a href="http://mashable.com/2007/10/30/whois-debate/">Mashable</a></p>
<p><strong>Update:</strong> The committee voted 17-7 to <a href="http://computerworld.com/action/article.do?command=viewArticleBasic&#038;taxonomyName=security&#038;articleId=9045158&#038;taxonomyId=17&#038;intsrc=kc_top">keep the Whois database as it is</a> right now and shot down a proposal to allow &#8220;natural persons&#8221; to designate a third party agent. The &#8220;sunset&#8221; proposal was defeated 13-10. ICANN is instead comissioning additional studies on the issue and will likely be coming back to it in the coming years. With the closer margin on the sunset option, it appears more likely than previously thought. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.plagiarismtoday.com/2007/10/30/whois-service-comes-under-fire/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using disk: enhanced

Served from: www.plagiarismtoday.com @ 2012-02-13 02:10:51 -->
