One of the hardest parts of dealing with spam, copyright infringement or other abuse issues on the Web is finding out who to report it to. To do that, typically one has to determine who is hosting the site and, though it is relatively simple with sites such as Myspace and Facebook, it gets far more complicated when dealing with blogs or sites that have their own domain names.
The techniques for determining who a host is are, at best, complicated and somewhat geeky in nature. Though I wrote a guide on how to use some tools for finding the host, the process remains one of the most common questions I get asked about.
At least one site, WhoIsHostingThis, has attempted to simplify this process. Turning into a Google-style experience. Previously reported on here the site did a respectable job in most cases, though there were some peculiar results on some tests.
The idea is that the networking wizardry should be hidden from the user and the site should receive a domain (or bookmarklet click) and then simply return the host. A great theory, especially for the non-tech oriented, but due to the nature of the work it is not always reliable. Most who are familiar with the tools, myself included, tended to lean on more sophisticated sites, such as DomainTools.
However, an upgrade at WhoIsHostingThis is attempting to change that, by fixing the kinks and bugs and, potentially, making the site a one-stop shop for domain hosting and information.
Some Geek Stuff
The typical way to determine the host of a site is a tool called IP Whois. Basically, IP Whois works like this:
- All servers on the Web (as well as all computers or routers facing the Web) resolve to an IP address, a set of four numbers from 0-255.
- Those IP addresses are controlled and doled out by various Regional Internet Registries (RIRs) that are non-profit oversight boards that help control these limited resources. ARIN is the RIR for the United States and North America.
- When RIRs assign IP addresses, they keep a registry of who is assigned what numbers. That information can be queried by an IP Whois.
- The most common purchasers of IP addresses are Web hosts, such as GoDaddy, ISPs, such a your cable company, and academic institutions.
- These institutions then allow their customers to use the IP address for accessing the Internet, hosting a site, etc. but usually do so only on their own network. Most of the time an IP address purchased by company X will point to a customer of their company.
- Thus, an IP Whois can usually trace you back to who is hosting a particular site or at least who is responsible for the IP address at that particular location.
The procedure is far from perfect and, as we’ll explore there are ways it can be gamed. But it is far more accurate than other methods, such as looking at the DNS servers, which can be trivially changed by spammers and plagiarists.
It is also this method that has been largely utilized by WhoIsHostingThis with great results. However, where the site has struggled has been with exceptions to the rule, cases where the IP Whois is misleading or, worse still, downright wrong.
Though these are cases that can usually be corrected with other tools, such as traceroutes (which look at the path traffic takes to arrive at the destination) or the DNS information, that information has, traditionally, not been used by WhoIsHostingThis.
That is starting to change.
The “HostGator Problem”
In March of 2007, one of the largest moves in Web hosting took place as HostGator, the very popular budget Web host, moved much of its 500,000 plus domains into ThePlanet’s datacenterdxwwcuvcsydc. Though the move made sense for both parties, it created an abuse reporting kludge that remains.
The problem is this, on those half million domains, the IP Whois information points to The Planet and not Hostgator since they are located within The Planet’s network. Thus many, myself included, have sent DMCA notices or spam reports to The Planet thinking that they were the host. This has created slow downs in addressing critical issues.
However, these problems are largely avoidable as the DNS servers, as well as other information, do point to HostGator as the host. The problem is that the information can be easily overlooked.
So, while this problem can be overcome by humans, it requires a fair amount of skill at reading networking and domain information and, even then, is prone to mistakes. WhoIsHostingThis is seeking to fix that problem by looking at multiple sources of information, including the DNS information, to determine who the host is.
In that regard it has already “fixed” the Hostgator problem, a search on the site for a HostGator domain reveals HostGator as the host, not The Planet. A similar result happens when you look for WordPress.com domains, as it shows WordPress as the host, not Layered Technologies.
Though the site provides the additional information below the main result, in case the results are mistaken, it is right in these cases.
Though WhoIsHostingThis has already integrated many of the hosts that, like HostGator, have their IP addresses listed as being another service, this is not to say that they have all of them. The operators of the site admit that the site needs further improvements.
However, where the site was previously about 95% accurate with its information, it is now most likely well over 99%. These cases where the IP Whois was wrong were rare to begin with and the site has already fixed most of the larger outliers. This means that only a fraction of a fraction of domains should return any issues.
That being said, there are still issues and bugs to be worked out. For one, where the site does very well with U.S. and Canada-based hosts, international ones, especially those in languages other than English, seem to give the site trouble from time to time. Also, there are still at least some cases where the information might be technically correct, but does not provide a correct URL for the host or enough information to locate it.
However, as I said earlier, these are extreme outliers. For most cases, WhoIsHostingThis works very well and certainly good enough for those that don’t have the technical expertise to use traditional networking tools.
Personally, I’ve begun using the WhoIsHostingThis bookmarklet to help me determine the host of sites and only using DomainTools or other sites whenever I get a strange result. It’s worked very well these past few weeks (since the updates began) and I’ve been impressed with the work that they have done.
Though I’m never likely to use this site, or any other site, as my exclusive resource for this kind of information (best to have confirmation no matter what you use), the improvements at WhoIsHostingThis have really impressed me.
While there is clearly work to be done, the progress is clearly evident and I am very happy with the improvements they have been making.