Fair use is an important right and a critical exception to the limitations placed on users by copyright law. It is critical to free speech, the exchange of ideas and our culture as a whole.
Fair use, if done correctly, benefits everyone, copyright holders, those that want to build upon said works and consumers.
However, some sites have sprung up that take advantage of the fair use doctrine. Rather than offering commentary, criticism or any other expression that is useful to the public, the sites are using the snippets to create something altogether different, spam.
This has lead to a new kind of scraped search engine spam that has left many copyright holders scratching their heads and several scrapers turning an easy profit.
How it Works
The idea is simple. There is enough free content out there that one does not have to take all of any one feed or article in order to generate a large volume of material. If you take a small, but relevant, portion of every post you scrape, you can generate a large volume of material very quickly provided you have a large enough base of RSS feeds to start with.
In that regard, it is somewhat similar to Instant Article Ghostwriter (IAG) in that it compiles a lengthier work from very short snippets of other works. However, unlike IAG, it creates not a single article, but an entire blog (or collection of blogs) out of extremely short snippets of scraped posts.
Where a site that scraped the entire posts, like Bitacle, would almost certainly encounter the wrath of the blogging community, these sites largely escape it by flying under the radar and operating in a legal gray area. The use summaries of posts, usually under 50 words, they provide links back to the full story, and a handful even properly attribute the use.
Yet, the sites are unmistakably spammy (nofollowed). They are almost always covered with ads, often provided by Google, they use computer generated tags and keywords (which often have nothing to do with the original post) and provide almost no human-usable content.
Worse still, some of these sites have started sending trackbacks and pings to the original articles, making it appear as if they are legitimate follow ups and generating even more search engine credibility.
However, what frightens many bloggers is that these sites may be operating completely legally. Though they get their content from hard-working bloggers, they are at least operating in a legal gray area which few will be willing to tread.
Is It Fair Use?
As has been discussed before on this site, there are four factors used to determine whether the copying of a work is considered “fair use”.
- the purpose and character of your use
- the nature of the copyrighted work
- the amount and substantiality of the portion taken, and
- the effect of the use upon the potential market.
Courts, generally, have put the highest emphasis on the first and fourth factors (with heaviest weight on the first) while considering the second and the third to be of less importance.
With that in mind, there is little doubt that the first factor is a strike against these sites. Their use is blatantly commercial and of no value to the public at large. In fact, their use is actually harmful to the public as it can clog search engine results and make the actual information harder to find. The use was not transformative, meaning that it was not altered by adding new meaning or expression, and thus courts would likely not look fondly on these sites.
However, the fourth element definitely favors the spammers. They do not damage the original work in any appreciable way and, in fact, may actually improve the potential market for the work by providing links to it.
Similarly, the second and third elements also likely favor the spammers. Though they would depend heavily upon nature of the posts taken, both in terms of length and content, the spammers have generally taken a relatively insignificant part of the work.
Overall, based upon how strongly courts have favored the first factor and how severely it goes against the spammers, it is unlikely that the courts would find the copying to be fair use. However, that is an issue up for significant debate and even lawyers can not agree on what is and is not fair use.
In short, we will not know until when (and if) a judge rules on it.
However, all of this debate assumes that the work taken is even copyrightable, something that may not always be true.
Is It Copyrightable?
Though when exactly a string of words becomes copyrightable is up for endless debate, it is worth noting that extremely short snippets might not even qualify for copyright protection.
Copyright law protects “original works of authorship fixed in any tangible medium of expression” but exactly when a phrase or a sentence becomes an “original work” is unclear in the eyes of the law.
Shorter works do enjoy copyright protection, including Haiku, but if what is taken does not have originality in and of itself, it might not be protected at all.
Though it is unlikely that a summary scraper would take such an insignificant amount as for the words to be uncopyrightable, such a small snippet would be useless to search engines or users, it is something to be wary of, especially in cases where short image descriptions are taken.
Where Angels Fear to Tread
All in all, these blogs are almost certainly violating copyright law with what they do. However, bloggers are much less likely to take a stand against them for several reasons.
- Legal uncertainty: Though the current court climate makes a fair use argument on the spammer’s part almost impossible. There are reasons for copyright holders to pause before taking action.
- Smaller Grievance: Since these sites don’t scrape full posts and many include links back, most bloggers don’t view the theft as being as severe as a traditional scraper.
- Disappearing Acts: Since these sites are little more than small-time spam operations, they tend to appear and disappear regularly. It’s rare for such a site to be around long enough to garter much wrath.
These factors combine to make it very unlikely that a layperson blogger will take any action against such a site. Though they might be annoyed at the misuse (especially if it comes with pings and trackbacks), they generally tolerate it and move on.
Still, if bloggers do decide to take action, there are things that can be done.
What To Do?
If you find that your content is being used by one of these sites, there are ways that you can fight back.
- Contact their Ad Networks: Odds are these sites are violating more than just copyright law. Contact their ad networks about any violations that they are engaging in. Made for Adsense sites
- Contact the Site Owner: Though the owners of these sites tend to be elusive, same can be contacted either by performing a whois search on the domain or looking for contact information on the site itself. Many spam site owners, even if they feel they aren’t in the wrong, will remove content upon request to avoid ugly conflicts and the potential lost revenue.
- Notify the Host: Spam, copyright infringement and scraping are all, most likely, against the TOS of the spammer’s host. Feel free to contact them or, if the content taken is sizeable enough and the host is located with the U.S., send them a DMCA notice.
- Report the Spam Blog: Finally, consider reporting the spam blog to the various sites that keep track of them. This can help the search engines avoid them and reduce their impact overall.
- Consider Prevention: There are various ways to prevent a site from scraping a feed. Though truncating a feed may not help in these situations, cloaking can.
All in all, you have the same options that you have for any copyright infringement case, but also the tools that come with a clear cut case of spamming.
If the copyright issues seem muddled or not worth the effort, than the spamming issues should be much more clear.
Personally, my strongest objection to these sites isn’t the scraping itself. Though it is somewhat annoying, it is done in a way that is at least more fair to the original author than traditional scraper sites.
Instead, I am more concerned about the spam issues these sites present, especially the ones that submit pings and trackbacks needlessly. Aggregating and presenting summaries of entries on another site may be of a benefit to both users and copyright holders, but only if it is done in a way that is actually useful, not just as an automated spam operation.
The difference between Technorati and a massive spam network may be difficult to describe in terms of how they collect their information, but it is easily seen when one actually tries to use them. One is a search engine that provides users with easy access to information, the other merely tries to trick the search engines into driving traffic their way.
It might seem to be a minor difference, but copyright law recognizes sites and services that serve a public good. That was critical to the Google cache being deemed fair use and would likely apply to any to any other site wishing to reuse content.