A commenter on the recent post regarding Bitacle’s insults wrote in to stand up for Bitacle saying, in part:
“Google, yahoo, and technorati scrape other people’s content every day (which I believe, there was a US court case that google won). They also display advertisements. Just like bitacle….”
“If you want your content off of bitacle, it also needs to be taken off of every other search engine that is caching it.”
While the commenter, who used the name Ricardo Sanborn, is correct that Google and other search engines do many of the same things as Bitacle, he, and others like him, are wrong to equate the two.
There’s a world of difference both legally and ethically between Bitacle (as well as other scraper sites) and the legitimate search engines. All one has to do is stop making excuses and start looking to see the difference.
Six Points of Distinction
As I said in my reply, there are at least six points of distinction between Bitacle and the legitimate search engines:
- Lack of Opt Out – Bitacle completely ignores robots.txt files, disregards meta tags and offers no means to opt out of the site. Though Bitacle claims there isn’t any “norm that forces (them) to obey the robots.txtvcsdewezexyz“, the presence of an opt out mechanism was critical to Google, and other search engines, in having their cache judged to be fair use (PDF, see pages 20 and 21).*
- Displays Full Content – Though major search engines cache Web pages, they do not display the full content of the sites they index on their own result pages. They display, at the most, small snippets of content. Also, search engine caches display the content in the original context, capturing all images, licenses and attribution, Bitacle merely scrapes the content and formats it for their own site.
- Destination, Not Direction – Major search engines exist to direct users to the sites they want to see, not to be end destinations. Bitacle’s “aggregates” feature not only displays the full content of every post, but also offers a Digg feature and a comment form. Users have almost no motivation to leave Bitacle’s version and visit the original site. These are clear signs that Bitacle’s goal is not to direct users to the sites they scrape, but to keep the traffic (and money) for themselves. This means that Bitacle’s use is not transformative (where the use of the copy is different than the original intent) and thus almost certainly not fair use (See above PDF pages 14-16)
- All About the Benjamins – Until Bitacle’s Adsense account was forcibly shut down, Bitacle was displaying ads next to the scraped content and, at last check, was still attempting to do so (leaving the Adsense block intact). Commercial use is heavily frowned upon in fair use arguments and profiting directly from one’s material without their permission is generally not considered fair use or fair dealing. (See above PDF, page 16 and 17).
- Bitacle’s Past – When Bitacle first started gaining attention, they provided no attribution to the original author of the posts and relicensed every post it scraped under a new Creative Commons license, regardless of how it was licensed under the original site. Though that behavior has stopped, it shows the lack of consideration that Bitacle has for bloggers at large.
- The Spam Factor – Finally, where most search engines do not allow other sites to index their cached copies, Bitacle encourages others to do so by automatically adding search-engine friendly metadata to every post they scrape. They have hundreds of thousands of pages indexed in Google, most of them from the aggregates section. Bitacle may not be the largest search engine spam operation, but it is definitely one of the most dangerous to copyright holders.
All in all, Bitacle is light years apart from Google and the other search engines both in terms of both law and ethics. Any comparison between the two is flawed.
A Change of Venue
Astute readers will quickly point out that all of the laws I have cited are American and that Bitacle is located in Spain. However, it is unlikely that Bitacle would find a much friendlier audience in its home country.
Spain is part of the European Union (E.U.) and the E.U. is where Google News was successfully sued for copyright infringement by Belgian newspapers. Though the case leaves many unanswered questions, it is clear that European courts are no friends to search engines, even ones that do actually offer an opt out.
While it remains to be seen how a Spanish court would react to Bitacle, E.U. copyright law is notoriously strict and it wouldn’t likely favor Bitacle.
In short, if Bitacle can not meet the standards of U.S. law, it will almost certainly fail the standards of E.U. law.
The law has made it clear that caching, for certain purposes, is very much legal and acceptable. However, Bitacle caches both the wrong way and for the wrong reasons. It is a violation of copyright law and ethically divorced from the search services it tries to emulate.
Though Bitacle apologists may try to make excuses and attempt to label those who are upset with Bitacle as hypocrites, it is Bitacle itself that is in the moral dillemma.
There is little question as to Bitacle’s legal and ethical standing, even if some people don’t want that to be the case.