Reverse Content Theft: Reflections on Scoble-gate

Robert Scoble kicked off a controversy when he was banned from Facebook for, according to him, running a script from a competing social network, Plaxo, designed to extract his Facebook account data and port it over to an account elsewhere.

The response was very divided. Many supported Scoble for what he did while others accused him of being dishonest about the whole issue.

Personally, I feel ambivalent about the matter. Though I am not comfortable with data lock-ins, as they artificially pad a company’s customer retention, I do not support Scoble knowingly violating a TOS and then complaining when he is banned and working to obtain reinstatement.

However, Scoble-gate, as it is being called on some sites, highlights a larger, much more worrisome issue. What would happen if the data involved wasn’t the information for his Facebook contacts, but rather, his blog postings or photos?

That would be very different and it raises some difficult issues.

I did some looking around and it became clear that many sites are executing a form of reverse content theft, locking in your work to their site and making it hard, or even impossible, to get out.

No longer do we have to just worry about our content appearing where we don’t want it, but also getting it where we need it in the first place.

The Premise

The idea of content theft itself is pretty simple. When you post a work to the Web, you have a choice in what sites it appears on. You can make that choice explicitly, by posting it yourself or directly granting permission, or implicitly through a license, such as a Creative Commons License, or other broad transfer of rights.

Most of the time, we worry about our content appearing where we don’t want it it. Spammers, scrapers and plagiarists can take content almost instantly and post it all over the Web, often times confusing the market and diluting the search results.

However, what happens when we post content to the Web, want to move it or copy it to another site but are prevented from doing so? It is a form of content theft in that it is a loss of control over placement. However, it is not traditional content theft but it is still a limitation on the rights of a copyright holder, in this case the right to share his or her own work with whomever he/she pleases.

As we become more and more reliant on third-party services, many of which we use for free, this issue of “reverse” content theft becomes more and more serious. As we add data to our various presences on the Web, our ability to export and move it around becomes critical.

However, what I discovered when looking at many of the popular services is that most don’t offer much in the way of portability. Here is a wrap up of some of the better known services and what I found when trying to export my data.

Blog Hosts

Blogger: Blogger does not have any export functionality. However, you can modify your Blogger template to convert the site itself into an exportable format. However, that does nothing to help move images or videos, just text.
Wordpress.com: Wordpress.com has the same import/export functionality that stand-alone installs do. Though, once again, it doesn’t help with moving media, this is at least an example of how to do exporting correctly.
LiveJournal: Offers an easy Web export tool as well as downloadable clients to make the process easy. Does not appear to be able to export comments and no word on moving images.

Note: With blog hosts, you can do a great deal of importing and exporting through the RSS feed. However, RSS feeds typically only cover the past ten entries, don’t include comments and don’t help with obtaining media. In some cases it can help, but is not a viable solution for large blogs.

Social Networks

MySpace: Myspace is a member of OpenSocial and allows a high amount of data portability. However, no clear way to export blog entries or photos.
Facebook: As Scoble discovered, Facebook, despite having a robust API, places severe limits on the amount of data that can leave the service and bans users that attempt to take it by force.
LinkedIn: Also a member of OpenSocial, LinkedIn allows you to export your connections into a variety of formats. Another good example of making data portable. However, LinkedIn hosts very little content produced by the user and most exportable data deals with others on the service.

Image Hosts

Flickr: Exporting images from FLickr can easily be done via their API. Though it is not an ideal solution, it is easy enough to work around.
PhotoBucket: Has no documented API and Flock does not support downloading to the hard drive (at least in my tests). Pro users, however, can download their images via FTP and others can order a CD for ten dollars. Sadly, all export options come at a price.
ImageShack: No simple way of retrieving images in bulk. Their tools only put files onto the service, not pull them down.

Video Hosts

All: Pretty much all of the video hosts are in the same position. They do not offer a means to easily download your own videos but hacks exist, especially using the Download Helper Firefox Extension, to get around that obstruction. The video sharing sites do not seem to be hostile toward such downloading. However, there is currently no means to go from a Web video to the original work as originals are destroyed shortly after conversion. It is important to save your original videos after upload.

The Problem

The issue with this limited portability of data is simple. The more you invest into a service, the harder and harder it is to get all of your own content back out.

For example, if you run a blog on a service that does not allow easy extraction, the longer you run that blog, the harder it is to move. When it is just a few entries it is feasible to hand copy/paste the text, but after a few months or years, it grows to be a monumental task.

Similarly, it is easy to save and download a few dozen images by hand, but what happens if you have hundreds, or even thousands, in your account? Should the service close down or a newer, better option come along, you may be stuck.

With paid hosting, you can always download your databases and grab all of your files via FTP. Moving to a new host is never a fun process, but it can be done without losing any data.

However, that is not the case for many of these free services. Many times, you are at their mercy. Unfortunately, as these services play a larger role both on the Web and in our lives, that lock in could be especially dangerous.

The good news is that there are ways to mitigate against this problem and prevent it from being the death of your content.

Mitigation

If you are worried about your content being locked in to a service there are several steps that you can take to prevent that from happening.

  • Use Services That Permit Export: The first step is obvious enough, but some services are better about allowing you to export your work than others. Favor those services that make it easy to take your data with you. Paid hosts are generally the best, but some free sites, including some mentioned above, allow for easy export as well.
  • Embed Images: If you are using a free blogging service, embed your images from a photo sharing site. I talked about this previously in terms of guarding against a DMCA notice, but it also makes moving easier since you don’t have to download your images from your blog host, just copy the HTML.
  • Backup Media Files: Too many people use their PhotoBucket or ImageShack account like an external hard drive. Keep a local copy of all of your media, especially your videos. Though some file hosts, such as Boxstr, do have the ability to export files, files hosted on third-party sites are often manipulated to make downloading and embedding easier. This often makes them lower resolution and poorer quality. This can happen to images, audio and video.
  • Granted, most of this is purely common sense, but it pays to think about these issues before you are setting up a new account or before you delve too deeply into it.

    After all, most people don’t think create an exit strategy for an account when they are creating it. As with other things in life, when something is new, most of us are one only thinking about the potential while assuming it will last forever.

    Sadly, on the Web, that never seems to be the case.

    Conclusions

    The idea of a lock-in is nothing new. Companies have been throwing up obstacles to leaving for centuries.

    However, one thing that history has shown us is that companies do not lock in users because it is difficult to allow them to leave or burdensome to create features that make transitioning easy, but because it is better for their bottom line if they don’t.

    The problem with this is, unlike cell phone contracts that simply hold our plan hostage, Web services are holding our content, something we spent a great deal of time and energy to create, to their whims.

    Whether it is family photos, a blog or a collection of witty videos, the content we upload to these services is a part of who we are and the fact that companies would withhold it from us as part of their business model is offensive.

    This is especially true since our uploading of content is exactly what makes their business model possible in the first place. Though these sites are a symbiotic relationship where we receive a free service and they get content to advertise against, one has to wonder about any partnership where one half tries to trap the other.

    To be fair, none of these sites use technology to actively prevent you from moving your content elsewhere. They all leave open the possibility of hand copying/saving and don’t prevent you from doing that.

    However, without automated tools to speed the process up, doing so is impractical to the point of being impossible. Considering that many of these sites prohibit users from creating their own tools for the purpose, such as what Facebook did, there is no practical way to escape these services.

    This can be a big problem for content creators and, in some cases, can actually be a much bigger issue than traditional content theft. This will be something to discuss and follow in the weeks and months to come.

    Want to Reuse or Republish this Content?

    If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

    Click Here to Get Permission for Free