5 Changes Making Content Tracking More Difficult

Jonathan BaileyMarch 17, 2011

4 minutes read

The Internet is not and has never been a static creation. What it is and how it works has always been in a constant state of flux and that, in turn, has drastically changed how we work with it.

These changes have impacted nearly everything we do on the Web, from how we talk to friends (email vs. Facebook) to how we look up information (Google Books, Google Scholar, etc.)

However, these changes also have an impact on how we track and, at times, enforce our content. While, overall, the Web is becoming a much more open place and the tools for finding what’s out there are getting better, there are some technologies that are making things more difficult.

With that in mind, here are five of the bigger changes I’m seeing that are hindering content monitoring what what, if anything that can be done about them.

1. Walled Gardens

Walled gardens have been a problem for a long time but for much of the time I’ve been working the problem has been primarily limited to smaller message boards and communities. In some niches it was worrisome, but generally not a major problem.

However, with social networks closing off more and more of the Web and these sites becoming a go-to destination for many people who just want to create a quick Web presence, an increasing amount of plagiarism, infringement and legitimate content sharing a like are taking place where search engines can’t see them.

This creates new challenges in finding content, whether for enforcement purposes, for statistics gathering or to just participate in the conversation.

2. Content Delivery Networks

A Content Delivery Network, or CDN, is simply a group of server farms spread out all over the world to provide a closer point of access for content. For example, if you load an image from a CDN and you are in New York, you’ll likely get it from a server in the U.S. where someone who tries to load the same file from Hong Kong will likely get a Chinese server.

Content Delivery Networks are great in that they improve the speed and reliability of content delivery, but they also create questions regarding where the content his hosted.

This is because, where once most content existed primarily in just one location, it can now exist in dozens of locations at once, making it difficult to track.

Still, usually one DMCA notice can still remove content across an entire CDN, however, sometimes CDNs also sit in between sites and their visitors, as with CloudFlare, adding an extra layer to pierce. (Disclosure: This site uses Cloudflare).

Though CDNs are not new, they traditionally were expensive but with tools like Cloudflare and Amazon Cloudfront, they are now more than approachable to any webmaster.

3. Embedding

Embedding or hotlinking content has always been common but now it is virtually ubiquitous. With dozens, if not hundreds of image and other content hosts specializing in providing content hosting for other sites, often for free, it is a crap shoot as to whether or not the content within a page is hosted on the same server as the page itself.

The best thing you can do is check the URL of the actual image, audio or video file and look for the host of that particular piece of content, rather than the page itself. Though this is easy with YouTube clips and even most audio players, it can be a pain for images as you need to make sure to grab the image URL, not the page URL, when doing your checks.

4. Hyper Dynamic Content

If you’re tracking text works, especially shorter phrases or even names for trademark evaluations, you will likely run across sites that update so quickly that the content cycles off the page it is supposed to be on in the time between when the search engine crawls it and when you go to read it, even if the time gap is only a few hours.

Some sites, especially automatically-generated spammy ones, update so quickly that even Google can’t crawl them quickly enough. While these aren’t worrisome as they don’t usually make much of an impact with the search engines, they create false positives that can interfere with finding real results.

This is especially true as these domains often pop up multiple times, cluttering up reports with needless garbage that can’t be effectively tracked down or filed against.

5. Translation Software

Automated translation is by no means “good” but it’s gotten to a point where, most of the time, one can at least understand what is being said. That’s far enough for some spammers and plagiarists as well as those who simply want to legitimately share work.

The problem is, plagiarism detection software is very poor for detecting translated plagiarism and probably won’t be getting any better in the near future.

The best you can hope to do, if you suspect that someone may use such a tool to plagiarize your work, is to use it yourself and see if you can find any matches. It is a gamble to say the least, one not worth taking unless you have a good reason, but it can work in some situations.

Bottom Line

All in all, the Web is a more open place than it was a few years ago and content is much easier to track. However, not every advancement is going to make things easier on us and many, as with those above, will make things much harder.

Because of the changing climate, content creators have to adapt and learn, shifting their strategies to try and make the best out of the situation.

This isn’t limited to just content detection, but to license, business model development and much more. If you’re working on the Web, you can’t get stuck in just one mindset and you have to watch and grow. It’s the only way to survive.