Abusing Fair Use

Fair use is an important right and a critical exception to the limitations placed on users by copyright law. It is critical to free speech, the exchange of ideas and our culture as a whole.

Fair use, if done correctly, benefits everyone, copyright holders, those that want to build upon said works and consumers.

However, some sites have sprung up that take advantage of the fair use doctrine. Rather than offering commentary, criticism or any other expression that is useful to the public, the sites are using the snippets to create something altogether different, spam.

This has lead to a new kind of scraped search engine spam that has left many copyright holders scratching their heads and several scrapers turning an easy profit.

How it Works

The idea is simple. There is enough free content out there that one does not have to take all of any one feed or article in order to generate a large volume of material. If you take a small, but relevant, portion of every post you scrape, you can generate a large volume of material very quickly provided you have a large enough base of RSS feeds to start with.

In that regard, it is somewhat similar to Instant Article Ghostwriter (IAG) in that it compiles a lengthier work from very short snippets of other works. However, unlike IAG, it creates not a single article, but an entire blog (or collection of blogs) out of extremely short snippets of scraped posts.

Where a site that scraped the entire posts, like Bitacle, would almost certainly encounter the wrath of the blogging community, these sites largely escape it by flying under the radar and operating in a legal gray area. The use summaries of posts, usually under 50 words, they provide links back to the full story, and a handful even properly attribute the use.

Yet, the sites are unmistakably spammy (nofollowed). They are almost always covered with ads, often provided by Google, they use computer generated tags and keywords (which often have nothing to do with the original post) and provide almost no human-usable content.

Worse still, some of these sites have started sending trackbacks and pings to the original articles, making it appear as if they are legitimate follow ups and generating even more search engine credibility.

However, what frightens many bloggers is that these sites may be operating completely legally. Though they get their content from hard-working bloggers, they are at least operating in a legal gray area which few will be willing to tread.

Is It Fair Use?

As has been discussed before on this site, there are four factors used to determine whether the copying of a work is considered “fair use”.

They are:

  1. the purpose and character of your use
  2. the nature of the copyrighted work
  3. the amount and substantiality of the portion taken, and
  4. the effect of the use upon the potential market.

Courts, generally, have put the highest emphasis on the first and fourth factors (with heaviest weight on the first) while considering the second and the third to be of less importance.

With that in mind, there is little doubt that the first factor is a strike against these sites. Their use is blatantly commercial and of no value to the public at large. In fact, their use is actually harmful to the public as it can clog search engine results and make the actual information harder to find. The use was not transformative, meaning that it was not altered by adding new meaning or expression, and thus courts would likely not look fondly on these sites.

However, the fourth element definitely favors the spammers. They do not damage the original work in any appreciable way and, in fact, may actually improve the potential market for the work by providing links to it.

Similarly, the second and third elements also likely favor the spammers. Though they would depend heavily upon nature of the posts taken, both in terms of length and content, the spammers have generally taken a relatively insignificant part of the work.

Overall, based upon how strongly courts have favored the first factor and how severely it goes against the spammers, it is unlikely that the courts would find the copying to be fair use. However, that is an issue up for significant debate and even lawyers can not agree on what is and is not fair use.

In short, we will not know until when (and if) a judge rules on it.

However, all of this debate assumes that the work taken is even copyrightable, something that may not always be true.

Is It Copyrightable?

Though when exactly a string of words becomes copyrightable is up for endless debate, it is worth noting that extremely short snippets might not even qualify for copyright protection.

Copyright law protects “original works of authorship fixed in any tangible medium of expression” but exactly when a phrase or a sentence becomes an “original work” is unclear in the eyes of the law.

Shorter works do enjoy copyright protection, including Haiku, but if what is taken does not have originality in and of itself, it might not be protected at all.

Though it is unlikely that a summary scraper would take such an insignificant amount as for the words to be uncopyrightable, such a small snippet would be useless to search engines or users, it is something to be wary of, especially in cases where short image descriptions are taken.

Where Angels Fear to Tread

All in all, these blogs are almost certainly violating copyright law with what they do. However, bloggers are much less likely to take a stand against them for several reasons.

  1. Legal uncertainty: Though the current court climate makes a fair use argument on the spammer’s part almost impossible. There are reasons for copyright holders to pause before taking action.
  2. Smaller Grievance: Since these sites don’t scrape full posts and many include links back, most bloggers don’t view the theft as being as severe as a traditional scraper.
  3. Disappearing Acts: Since these sites are little more than small-time spam operations, they tend to appear and disappear regularly. It’s rare for such a site to be around long enough to garter much wrath.

These factors combine to make it very unlikely that a layperson blogger will take any action against such a site. Though they might be annoyed at the misuse (especially if it comes with pings and trackbacks), they generally tolerate it and move on.

Still, if bloggers do decide to take action, there are things that can be done.

What To Do?

If you find that your content is being used by one of these sites, there are ways that you can fight back.

  1. Contact their Ad Networks: Odds are these sites are violating more than just copyright law. Contact their ad networks about any violations that they are engaging in. Made for Adsense sites are just as forbidden as those that infringe upon copyright. You can also, if the theft is substantial enough, report the copyright violation to the ad network.
  2. Contact the Site Owner: Though the owners of these sites tend to be elusive, same can be contacted either by performing a whois search on the domain or looking for contact information on the site itself. Many spam site owners, even if they feel they aren’t in the wrong, will remove content upon request to avoid ugly conflicts and the potential lost revenue.
  3. Notify the Host: Spam, copyright infringement and scraping are all, most likely, against the TOS of the spammer’s host. Feel free to contact them or, if the content taken is sizeable enough and the host is located with the U.S., send them a DMCA notice.
  4. Report the Spam Blog: Finally, consider reporting the spam blog to the various sites that keep track of them. This can help the search engines avoid them and reduce their impact overall.
  5. Consider Prevention: There are various ways to prevent a site from scraping a feed. Though truncating a feed may not help in these situations, cloaking can.

All in all, you have the same options that you have for any copyright infringement case, but also the tools that come with a clear cut case of spamming.

If the copyright issues seem muddled or not worth the effort, than the spamming issues should be much more clear.

Conclusions

Personally, my strongest objection to these sites isn’t the scraping itself. Though it is somewhat annoying, it is done in a way that is at least more fair to the original author than traditional scraper sites.

Instead, I am more concerned about the spam issues these sites present, especially the ones that submit pings and trackbacks needlessly. Aggregating and presenting summaries of entries on another site may be of a benefit to both users and copyright holders, but only if it is done in a way that is actually useful, not just as an automated spam operation.

The difference between Technorati and a massive spam network may be difficult to describe in terms of how they collect their information, but it is easily seen when one actually tries to use them. One is a search engine that provides users with easy access to information, the other merely tries to trick the search engines into driving traffic their way.

It might seem to be a minor difference, but copyright law recognizes sites and services that serve a public good. That was critical to the Google cache being deemed fair use and would likely apply to any to any other site wishing to reuse content.

Tags: , , , , , , , , , , ,

10 Responses to Abusing Fair Use

  1. Maria says:

    For instance… I discovered this use of my post and although I didn't like it because, in general, I don't like to be included in aggregated content used for commercial purposes, the fragment they are using is so small, and they link back to me. I don't like that they identify the author as "English", but in the end I didn't feel like I could do anything about it.

    They seem to be aggregating not from my feed, but from Technorati's tag feeds which I can't control. I spent a tremendous amount of time looking for ways how I can opt out of this in Technorati but couldn't find anything.

    How can I contact this site's owners and ask them to get me off their list and stop grabbing my content if they're getting it from Technorati?

  2. Maria says:

    For instance… I discovered this use of my post and although I didn’t like it because, in general, I don’t like to be included in aggregated content used for commercial purposes, the fragment they are using is so small, and they link back to me. I don’t like that they identify the author as “English”, but in the end I didn’t feel like I could do anything about it.

    They seem to be aggregating not from my feed, but from Technorati’s tag feeds which I can’t control. I spent a tremendous amount of time looking for ways how I can opt out of this in Technorati but couldn’t find anything.

    How can I contact this site’s owners and ask them to get me off their list and stop grabbing my content if they’re getting it from Technorati?

  3. Alfred says:

    I think this web site is not so bad, as you are writing. On every site, where your post snippet is to see is a normal “follow” link to the original author. Whats the problem? A snippet is not an article. Every newspaper has more “stolen” content than this small blog. If every site would be so fair and would place a backlink i would be very happy.

  4. Maria says:

    Alfred, did you bother to read the article? Did you even read my full comment?

    I said "[but] the fragment they are using is so small, and they link back to me" and "I don’t like to be included in aggregated content used for commercial purposes".

    As I said, and Jonathan points out, this use is not terrible enough to make me take the same actions I would take with outright plagiarism. But that's exactly what Jonathan is talking about on this article.

    I have found the full text of one of my posts reproduced without my permission on a different web site, and I have not screamed or done anything about it because that site is actually using my content to add value: They are educating consumers on how bad a certain company is. And they do NOT generate any ad revenue with such content.

    On the other hand, the small blog you defend so passionately is not adding any value at all. They are supposedly aggregating content related to "English" and my post is not remotely close to that. So, it's clear that they are simply aggregating content indiscriminately solely for the purpose of generating ad revenue… Not terribly wrong, but wrong nonetheless.

    Hope that helps.

  5. Maria says:

    To get back to the main point of my initial comment, what I'm most concerned about is the lack of control over the distribution of my content when scrapers are not grabbing it from my feeds (which I can control), but from Technorati's feeds or API, and no opt-out options are available to me.

    For instance, Flickr allows people to use their API and all public content posted on the site, but they also allow users an option to exclude their content from such API. Not the same with feeds: I have no way of excluding my content from Flickr's public feeds, or Technorati feeds.

  6. Maria says:

    Alfred, did you bother to read the article? Did you even read my full comment?

    I said “[but] the fragment they are using is so small, and they link back to me” and “I don’t like to be included in aggregated content used for commercial purposes“.

    As I said, and Jonathan points out, this use is not terrible enough to make me take the same actions I would take with outright plagiarism. But that’s exactly what Jonathan is talking about on this article.

    I have found the full text of one of my posts reproduced without my permission on a different web site, and I have not screamed or done anything about it because that site is actually using my content to add value: They are educating consumers on how bad a certain company is. And they do NOT generate any ad revenue with such content.

    On the other hand, the small blog you defend so passionately is not adding any value at all. They are supposedly aggregating content related to “English” and my post is not remotely close to that. So, it’s clear that they are simply aggregating content indiscriminately solely for the purpose of generating ad revenue… Not terribly wrong, but wrong nonetheless.

    Hope that helps.

  7. Maria says:

    To get back to the main point of my initial comment, what I’m most concerned about is the lack of control over the distribution of my content when scrapers are not grabbing it from my feeds (which I can control), but from Technorati’s feeds or API, and no opt-out options are available to me.

    For instance, Flickr allows people to use their API and all public content posted on the site, but they also allow users an option to exclude their content from such API. Not the same with feeds: I have no way of excluding my content from Flickr’s public feeds, or Technorati feeds.

  8. JB says:

    Alfred,

    As I said in the article, my concern with these sites has less to do with copyright and more to do with the spam issue. The site I linked to sent me at least two trackbacks and those were just the ones that got through my spam filters.

    In terms of copyright infringement, they aren't horrible, but what they spawn is pretty disgusting.

    Maria,

    I couldn't have said it better myself to be honest. The reason I didn't comment sooner was I didn't know what I could add.

    To jump in the API chain problem, it's something I'm going to cover soon. I touched on it briefly a while back when discussing spam blogs that escape FeedBurner, but you are right it requires more in depth coverage. I'm writing some friends now.

    Hope that you are well!

  9. JB says:

    Alfred,

    As I said in the article, my concern with these sites has less to do with copyright and more to do with the spam issue. The site I linked to sent me at least two trackbacks and those were just the ones that got through my spam filters.

    In terms of copyright infringement, they aren’t horrible, but what they spawn is pretty disgusting.

    Maria,

    I couldn’t have said it better myself to be honest. The reason I didn’t comment sooner was I didn’t know what I could add.

    To jump in the API chain problem, it’s something I’m going to cover soon. I touched on it briefly a while back when discussing spam blogs that escape FeedBurner, but you are right it requires more in depth coverage. I’m writing some friends now.

    Hope that you are well!

Leave a Reply

STAY CONNECTED

authorized