The Image/File Hosting Problem

amazon-s3-logo

In 2007 I wrote an article entitled “Why I Embed My Images” that discussed how embedding images and other can provide greater security when you feel there is a risk someone might file a takedown notice. By separating your images from your server, should someone file a takedown notice over an image, your site will remain active and, with good backups, you can get your site back up more quickly.

It is a way to guard against misuse of the DMCA or fair use disputes.

However, since then I have backed away from that stance. Once I moved to my new VPS, I stopped hosting images remotely as I have a good relationship with my host and have no reasons to worry. That being said, in an effort to improve the efficiency of the site, I’ve also started toying with Amazon S3 to see if it can help improve the site’s speed (the images in this post will be hosted on S3 as part of the test).

It was at this point that I realized a problem. If I were malicious in my use of S3, or any similar service, it could be used as a method not to prevent complete site failure, but to avoid a DMCA altogether. It is possible, using these services, to trick users into filing complaints with the wrong hosts, delaying or even preventing anything from being done.

I immediately, using my own site as a test subject, began to seek a way around it and, fortunately, found a way to ensure that, no matter where a file is hosted, you’ll always be able to track down the host with reasonable accuracy.

The Nature of the Problem

If you right click on the images in this post and view their URL, you’ll see that they are hosted on a subdomain of Plagiarism Today named “www.plagiarismtoday.com”. This makes it appear, including to many automated tools, that the content is hosted on the same server as the rest of the site. The problem is that they are hosted on Amazon S3, clear across the country.

This trick is fairly trivial to do and only involves a minor tweak to DNS. There are many legitimate reasons for doing it, for example, hosting images on your domain while using a content delivery network to increase speed.

However, if a copyright holder decided one of these images were infringing, filing a DMCA notice would be difficult. The reason is that since the files are on a subdomain of plagiarismtoday.com most will assume it’s located on my server and act accordingly. This is due to a fluke in both the way we read URLs, where we routinely ignore subdomains, and the way networking tools routinely discard subdomain information.

Some copyright holders, especially those less familiar with DNS and networking, might not consider this and could inadvertently file a DMCA notice or other abuse complaint with the wrong host. This can result in a delay in getting a complaint resolved, in it being outright ignored or even causing it to be handled in a questionable way.

The good news is that there is a simple way around it and, as long as you are careful about how you gather your information, there is no need to make this mistake.

Dealing with Linked Files

wiht-logo-1

When you’re dealing with an image file or any content that is linked into a Web page (not part of the actual HTML) it is important to make sure that you get the correct information about where that particular file is hosted, not just the page that it is on.

The solution is pretty simple:

  1. Get the URL of the File: Rather than copying the URL of the page, right click the image or the link and copy the URL. Check and see if it is on the same site, a subdomain or another domain altogether.
  2. Use Who Is Hosting This: Once you have the URL, delete the “https://” as well as everything including and after the first remaining “/” and process it through Who is Hosting This. Who Is Hosting This handles subdomains correctly, unlike Domain Tools, which strips out subdomain information in my testing.
  3. Confirm the Results: You can then confirm the results by copying the IP address (you’ll have to actually copy the numbers on the site, not using the link) and then running it through Domain Tools. Once you’ve done that, you can then go forward and begin the work of finding the DMCA or abuse agent and contacting them.

Though this adds a few extra steps to the process, it is worth doing to ensure that you contact the correct party as doing so is the only way to guarantee the quickest and most reliable resolution.

Why This is Important

The reason that this is critical is because sending a DMCA notice to the wrong host, at the very least, will greatly slow down the process as the host has to research and figure out what is going on and then decide if they going to A) Disable the page anyway B) Forward the notice on or C) Do nothing.

Since the company that hosts the Web site does not host the image, their role under the DMCA is much less clear. Section 512(c), which usually deals with Web hosts and takedowns, only pertains to “the storage at the direction of a user of material that resides on a system or network controlled or operated by or for the service provider”. Since there is no storage, a regular DMCA notice doesn’t apply.

Section 512(d) does pertain to “information location tools” but in that case, it would be the site owner, not the host that is party for the notice. This section deals with sites, such as Google, that are “referring or linking users to an online location containing infringing material or infringing activity”. Since the host isn’t the one linking to the file, it is the user, the application of 512(d) doesn’t make as much sense.

This isn’t to say that hosts won’t deactivate sites or remove pages if the content is embedded or hyperlinked, especially if the site is spammy in nature or has other abuse issues, but the fastest way to secure removal of images or other media files is to go to the source.

It can be a bit tedious to do, but it is well worth the time.

Bottom Line

The simple truth is that the days of all of the content on a site being hosted on the same server have long since passed. Content embedding from photo sharing sites, video sites and elsewhere have made it much more difficult easily track down where a particular item is hosted.

Though sometimes, as with YouTube clips, where the content is hosted is obvious, other times, as with image hosts, it is much less clear.

Unless you are dealing with textual works, which are almost never embedded (unless you use a service such as Voxant Newsroom that embeds text via Flash and JavaScript), this is something you have to constantly watch out for.

Dealing with content theft issues is not difficult, but it does require a bit of detective work. However, knowing the challenges you face and the tools that can help you overcome them can keep the sleuthing required to a minimum.

Want to Republish this Article? Request Permission Here. It's Free.

Have a Plagiarism Problem?

Need an expert witness, plagiarism analyst or content enforcer?
Check out our Consulting Website