The Image/File Hosting Problem
By Jonathan Bailey • Mar 25th, 2009 • Category: Articles, DMCA, Legal Issues
In 2007 I wrote an article entitled “Why I Embed My Images” that discussed how embedding images and other can provide greater security when you feel there is a risk someone might file a takedown notice. By separating your images from your server, should someone file a takedown notice over an image, your site will remain active and, with good backups, you can get your site back up more quickly.
It is a way to guard against misuse of the DMCA or fair use disputes.
However, since then I have backed away from that stance. Once I moved to my new VPS, I stopped hosting images remotely as I have a good relationship with my host and have no reasons to worry. That being said, in an effort to improve the efficiency of the site, I’ve also started toying with Amazon S3 to see if it can help improve the site’s speed (the images in this post will be hosted on S3 as part of the test).
It was at this point that I realized a problem. If I were malicious in my use of S3, or any similar service, it could be used as a method not to prevent complete site failure, but to avoid a DMCA altogether. It is possible, using these services, to trick users into filing complaints with the wrong hosts, delaying or even preventing anything from being done.
I immediately, using my own site as a test subject, began to seek a way around it and, fortunately, found a way to ensure that, no matter where a file is hosted, you’ll always be able to track down the host with reasonable accuracy.
The Nature of the Problem
If you right click on the images in this post and view their URL, you’ll see that they are hosted on a subdomain of Plagiarism Today named “files.plagiarismtoday.com”. This makes it appear, including to many automated tools, that the content is hosted on the same server as the rest of the site. The problem is that they are hosted on Amazon S3, clear across the country.
This trick is fairly trivial to do and only involves a minor tweak to DNS. There are many legitimate reasons for doing it, for example, hosting images on your domain while using a content delivery network to increase speed.
However, if a copyright holder decided one of these images were infringing, filing a DMCA notice would be difficult. The reason is that since the files are on a subdomain of plagiarismtoday.com most will assume it’s located on my server and act accordingly. This is due to a fluke in both the way we read URLs, where we routinely ignore subdomains, and the way networking tools routinely discard subdomain information.
Some copyright holders, especially those less familiar with DNS and networking, might not consider this and could inadvertently file a DMCA notice or other abuse complaint with the wrong host. This can result in a delay in getting a complaint resolved, in it being outright ignored or even causing it to be handled in a questionable way.
The good news is that there is a simple way around it and, as long as you are careful about how you gather your information, there is no need to make this mistake.
Dealing with Linked Files

When you’re dealing with an image file or any content that is linked into a Web page (not part of the actual HTML) it is important to make sure that you get the correct information about where that particular file is hosted, not just the page that it is on.
The solution is pretty simple:
- Get the URL of the File: Rather than copying the URL of the page, right click the image or the link and copy the URL. Check and see if it is on the same site, a subdomain or another domain altogether.
- Use Who Is Hosting This: Once you have the URL, delete the “http://” as well as everything including and after the first remaining “/” and process it through Who is Hosting This. Who Is Hosting This handles subdomains correctly, unlike Domain Tools, which strips out subdomain information in my testing.
- Confirm the Results: You can then confirm the results by copying the IP address (you’ll have to actually copy the numbers on the site, not using the link) and then running it through Domain Tools. Once you’ve done that, you can then go forward and begin the work of finding the DMCA or abuse agent and contacting them.
Though this adds a few extra steps to the process, it is worth doing to ensure that you contact the correct party as doing so is the only way to guarantee the quickest and most reliable resolution.
Why This is Important
The reason that this is critical is because sending a DMCA notice to the wrong host, at the very least, will greatly slow down the process as the host has to research and figure out what is going on and then decide if they going to A) Disable the page anyway B) Forward the notice on or C) Do nothing.
Since the company that hosts the Web site does not host the image, their role under the DMCA is much less clear. Section 512(c), which usually deals with Web hosts and takedowns, only pertains to “the storage at the direction of a user of material that resides on a system or network controlled or operated by or for the service provider”. Since there is no storage, a regular DMCA notice doesn’t apply.
Section 512(d) does pertain to “information location tools” but in that case, it would be the site owner, not the host that is party for the notice. This section deals with sites, such as Google, that are “referring or linking users to an online location containing infringing material or infringing activity”. Since the host isn’t the one linking to the file, it is the user, the application of 512(d) doesn’t make as much sense.
This isn’t to say that hosts won’t deactivate sites or remove pages if the content is embedded or hyperlinked, especially if the site is spammy in nature or has other abuse issues, but the fastest way to secure removal of images or other media files is to go to the source.
It can be a bit tedious to do, but it is well worth the time.
Bottom Line
The simple truth is that the days of all of the content on a site being hosted on the same server have long since passed. Content embedding from photo sharing sites, video sites and elsewhere have made it much more difficult easily track down where a particular item is hosted.
Though sometimes, as with YouTube clips, where the content is hosted is obvious, other times, as with image hosts, it is much less clear.
Unless you are dealing with textual works, which are almost never embedded (unless you use a service such as Voxant Newsroom that embeds text via Flash and JavaScript), this is something you have to constantly watch out for.
Dealing with content theft issues is not difficult, but it does require a bit of detective work. However, knowing the challenges you face and the tools that can help you overcome them can keep the sleuthing required to a minimum.
|
|
Protect Your Work. Subscribe to Plagiarism Today via Email or RSS. |
Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

This covers your discussion of hypothetical DMCA suits. But what about the second part of your previous blog post pertaining to countering slashdotting?
What changed your opinion, initially arguing that keeping all your eggs in one basket would have your site go down, if your traffic spiked – not to mention the dynamic bandwidth cost increase if you are using a dynamically priced scalable bandwidth?
Newcomer to your blog, by the way. Love it, and I'm confounded that it hasn't received a bigger following as of yet.
Good point. I covered the DMCA issue but not the server one.
The biggest factor in my decision to move back to one server (before moving my static files to Amazon S3) was that I purchased a VPS. My server can handle many, many times the load it carries now. If I were sloppier with caching and didn't move my static files to S3, that wouldn't likely be the case, but right now my load is a very small fraction of the max it can handle.
The VPS has weathered several traffic spikes already without a problem so I am not too worried.
As far as the site not getting as much of a following that it deserves, I'm not 100% sure why either. Some of it is my doing, the name is terrible, I admit that. Some of it is that a measured, effective approach at licensing and protecting content strategically just isn't cool, read the articles on Digg and Techmeme and you'll see what I mean, and some of it is just that the keywords this topic creates are snoozers.
However, we have been a period of pretty rapid growth the past month or so, I guess we'll see…
Thanks for the interest and if you have any ideas on how to get this site more “out there”, I am definitely open.
Thanks for the follow-up.
You seem to be keeping yourself well-covered with all the bookmarking links, along with a presence on Disqus, Twitter, and, to some extent, Tumblr. The use of the two latter might be extended by creating yet another set of profiles to use for new article headline notification by means of RSS. The tool HootSuite supports it for Twitter, I believe, while Tumblr has a native RSS syndication feature, though I am not aware of
whether it merely posts the article titles or the articles themselves verbatim.
Social Web aggregation sites BackType and FriendFeed (check out the beta) could also be some pastures you would want to look into. BackType aggregates your presence in commenting across various systems, while FriendFeed aggregates the bulk of whatever online presences you have across the whole web. Additionally, Disqus allows you to
mergewhat people on FriendFeed say about your blog with your homepage's Disqus comments, though it may imply ruining any structured, linear on-going debate. Alternatively, the two can be separated by means of the feature called http://blog.disqus.net/2009/04/02/social-media-reactions/>Social Media Reactions, which also shows comments from such sites as Twitter.
If I were to have one main gripe about the vast majority of websites, it's that they lend you the impression that if you didn't follow the blog from the start, you have no way of acquiring the knowledge shared up until know, unless you use the archive button and keep going back, back, back … and back. To some extent, it creates an image of the blog posts not mattering
particularly, merely serving as SEO- and money-driven content, which is clearly not the case for you and such a site as Daily Blog Tips. Some site owners, however, see this problem and publish books based on what they've written, making it available, i.e. accessible, in an organized fashion.This is both a boon to the selected few who've been with the blog from the start, as they no longer have to go through a mess of disorganized bookmarks, whenever the author struck a chord with them; and as for the, newcomers, they save time by having the author trim all the information, eschewing out redundancies and earlier versions, leaving only what is best and what (still) matters. Not to mention the general advantages of print versus digital.
In your case, newcomers would get to see your recommendation for the best watermarking or plagiarism detection tool, rather than seeing you go through the lot of them without getting a tangible idea about which you prefer, and, conclusively, recommend. As I take a second glance at your articles, a great lot of them concerns discussions of services and tools; even so after reading them, I don't get the impression that I've learned whether the products discussed are among the best or worst of the bunch. It reads like a news blog “X new product is out; let's take a look at it”; rather, I'd like a review or sorts and a summary at the end telling me if it's one of the best, and, bad or not, what the best products in the business is according
to you. Compare it to a mobile phone review, which, ideally, would state something along the lines of “Is the new BlackBerry an iPhone killer? (…) A great product, which might rival the iPhone, which we believe to be the best in the market; these and these aspect set them apart. Take your pick according to which characteristics you find to your liking”. And, possible,
separating the legalese articles from the product reviews might be a feasible suggestion. I have only just discovered your blog, so I'm not entirely sure how much they make up of your articles, but just an observation on my part.
Otherwise, your older articles seem to get left behind, even though full use could be taken of them; some of them needs to build upon each other like the aforementioned review articles in order for them to keep their legitimacy. Not that you need to organize it in a book in order for it to work (even though lulu.com makes such a prospect surmountable), but allow your readers
to study various topics that fit their own agenda, and see the conclusion of various articles on the same topic as aforementioned. You've already anticipated this to a joyous extent in relation to DMCA-filing guidelines as seen in the top menu bar. More of that philosophy applied to the entirety of your articles. Not doing so tends to put articles of a certain age into retirement, lest someone stumbles upon them, or if you link back to them in a newer article. Doing so ensures that people bookmark your site and stick to visiting it regularly, rather than meandering towards it, whenever it's dugg or slashdotted, likening it to a site like Cracked.com, which uses a model of word of mouth and striking a chord with particular articles rather than keeping the same userbase who RSS it. Database and knowledge repository versus article dispenser, one might put it crudely. Compare Wikipedia to New
York Times if you will.
As for to reeling in more visitors, I would expect some targeted zeitgeist articles to get some attention, at least if you could arrange for some lobbying users to spread the word. One thing that I believe would garner attention is the overlooked, but very important, topic of copyright for
forum posts – and, less interestingly, blog comments. The details on this matter to an outsider are somewhat murky, and not knowing your rights on the matter might be a disincentive to partake in larger discussion. Of course, the role of the victim is as interesting as the one of the perpetrator, who may in fact not know about his or her own transgressions. And what's the
responsibility and role role of the forum owner in all this – to both parties in the dispute, notwithstanding? Another topic is the tendency for to copy-paste news articles on a forum verbatim, even though this clearly infringes copyright. On videogames forum – which frankly make up the entirety of the biggest forums – magazine scans can often be seen embedded.
(The biggest two forums within this sphere of discussion being NeoGAF and Something Awful, the former being the most tangible
in terms of overview, which also means that it would be more likely to get slashdotted, if they posted some of your writing – hopefully not verbatim!). As an extra spice, you could make your “internet forum legal guidelines” available by means of Creative Commons, perhaps allowing for verbatim copy-pasting, assuming proper attribution. That way, you'd get some exposure
while ensuring a cleaner legal environment in an otherwise grey area of internet jurisprudence.
Don't read this as some scorching criticism; just as remarks regarding untapped potential of an already great site.
Thank you very much for this and I do not read it as a criticism at all. I appreciate the help. I am going to take some time to consider this and will probably work on some of your points shortly. I do need some time to digest though, quite a lot here
Thank you very much for your help and suggestions. They are very much appreciated!
I finally recalled the term I was looking for to succinctly make my point (I could only remember it in Danish): Cumulative information.
Don't start over and archive the previous writing every single time but try to delve into all the topics as deeply as possible, and whether they've been dissected to the point of exhaustion or not, always put the newfound or -disseminated knowledge together in a summary. Some people do this in the form of a book; others find other ways to do it.
A little late, but the the term was important to clarify my remarks.
Yes eBay just got my site took down at http://www.eBuster.co.uk because it links back to scammer accounts on eBay so people can see the scams for themselves what these people are doing and it also contained copies of many pages I had preserved because eBay has a habit of trying to hide pages in an effort to prevent the course of justice.
The site also contains millions of eBay member names that can be searched but I’m not sure if that come under copyright law or not as it is possible to search member for member names from Google that gives you links directly into eBay accounts.
I think my best bet is to take a leaf out of eBays book which is to side step many laws here in the UK including the FSA by having eBays registered office in Luxemburg so maybe I need to look offshore myself for my service provider.
Maybe it’s becoming against the law to expose fraud on eBay as the only way I can see anyone making a case is by linking to eBay pages themselves or by taking static copied of pages but it seems both are prohibited under DMCA rules and eBay certainly seems to have their way with the law here in the UK when it comes to the Birmingham police and trading standards who point blank refuse to except concrete evidence linking a fraudster to no less than eight eBay accounts being used to sell death traps as cars.
What seems to have really have upset eBay was a fake login page hosted by about.ebay.com that I exposed on my site by taking a copy of the page and adding in big red text that not to used the page as it was a fake.
Any advise would be appreciated before I decide how I should go about hosting the site that is dedicated to exposing multiple frauds on eBay that eBay is turning a blind eye towards.
Yes eBay just got my site took down at http://www.eBuster.co.uk because it links back to scammer accounts on eBay so people can see the scams for themselves what these people are doing and it also contained copies of many pages I had preserved because eBay has a habit of trying to hide pages in an effort to prevent the course of justice.
The site also contains millions of eBay member names that can be searched but I’m not sure if that come under copyright law or not as it is possible to search member for member names from Google that gives you links directly into eBay accounts.
I think my best bet is to take a leaf out of eBays book which is to side step many laws here in the UK including the FSA by having eBays registered office in Luxemburg so maybe I need to look offshore myself for my service provider.
Maybe it’s becoming against the law to expose fraud on eBay as the only way I can see anyone making a case is by linking to eBay pages themselves or by taking static copied of pages but it seems both are prohibited under DMCA rules and eBay certainly seems to have their way with the law here in the UK when it comes to the Birmingham police and trading standards who point blank refuse to except concrete evidence linking a fraudster to no less than eight eBay accounts being used to sell death traps as cars.
What seems to have really have upset eBay was a fake login page hosted by about.ebay.com that I exposed on my site by taking a copy of the page and adding in big red text that not to used the page as it was a fake.
Any advise would be appreciated before I decide how I should go about hosting the site that is dedicated to exposing multiple frauds on eBay that eBay is turning a blind eye towards.
Yes eBay just got my site took down at http://www.eBuster.co.uk because it links back to scammer accounts on eBay so people can see the scams for themselves what these people are doing and it also contained copies of many pages I had preserved because eBay has a habit of trying to hide pages in an effort to prevent the course of justice.
The site also contains millions of eBay member names that can be searched but I’m not sure if that come under copyright law or not as it is possible to search member for member names from Google that gives you links directly into eBay accounts.
I think my best bet is to take a leaf out of eBays book which is to side step many laws here in the UK including the FSA by having eBays registered office in Luxemburg so maybe I need to look offshore myself for my service provider.
Maybe it’s becoming against the law to expose fraud on eBay as the only way I can see anyone making a case is by linking to eBay pages themselves or by taking static copied of pages but it seems both are prohibited under DMCA rules and eBay certainly seems to have their way with the law here in the UK when it comes to the Birmingham police and trading standards who point blank refuse to except concrete evidence linking a fraudster to no less than eight eBay accounts being used to sell death traps as cars.
What seems to have really have upset eBay was a fake login page hosted by about.ebay.com that I exposed on my site by taking a copy of the page and adding in big red text that not to used the page as it was a fake.
Any advise would be appreciated before I decide how I should go about hosting the site that is dedicated to exposing multiple frauds on eBay that eBay is turning a blind eye towards.