Updated: See end of article for update
The Investor Relations Web Report calls it "the new plagiarism". (Note: The original blog post is down.) Dan Zarella from Puritan City call those who engage in it "the best plagiarists". Others simply call them bloggers or, as Zarella also put it, "Human Aggregators".
They’re a new breed of content users that walk a gray area between that which is clearly fair use and what is obviously content theft. Their blogs are marked with large swaths of block quotes and heavy content reuse, but also proper attribution and at least some original content.
These sites, as they’ve grown in number, have created a great deal of controversy among bloggers who are left to wonder if they are nothing more than content thieves in disguise.
Block quotes by the Dozen
These sites, which for this article I’ll simply call "gray", are generally identified by a large number of very short posts, with much of it in block quotes or otherwise directly lifted content. Though they meticulously credit their sources, bowing to more traditional rules for blog attribution, and work to add at least some original content, usually over half of their material comes from other sources.
This has caused many bloggers to worry that these grey blogs might be trying to get away with content theft under the guise of legitimate attribution. The idea being that they can create a much larger volume of content if they only have to write a small portion of it. Users will simply visit the gray blogs since they are able to provide so much more information and, due to the use of liberal quoting, the user will then have no reason to visit the original source. After all, they already have most of the critical information.
While certainly grey blogs don’t pose the same threat or raise the same concerns as spam blogs and other content scrapers, the cause for concern is clear. Even though blogging is about sharing and reusing information, excessive sharing threatens the authors penning the original content. The tale of the goose laying the golden egg springs to mind as, quite simply, greed can be the blogging world’s biggest enemy.
A Separation of Degrees
What makes this issue so difficult to address, and so difficult to write about, is that it’s not so much about gray blogs, but rather, various shades of grey blogs. The difference between someone simply quoting blogs and someone trying to tweak the system is not a clear cut matter, but a separation of degrees.
Quoting, even liberal quoting, is expected by blogs. It’s a part of researching a story and covering ongoing stories as well as sharing information. If done properly, it can not only be used to create a new work, but also drive valuable traffic to the original site. In the blogging world, being the source is often a badge of honor.
However, basing your entire site, or even a larger percentage of it, on quoted content is viewed differently. Being a source in a larger article is one thing, but having your content be the majority of the article on another site another. What distinguishes one from the other is unclear at best. There are no math formulas or systems for determining what is right or what is too much.
More confusing still, everyone has a different idea of what constitutes content theft. With Creative Commons Licenses
The challenge becomes to strike a balance and set some kind of guideline that is compatible with copyright law, acceptable under the current code of blogging ethics but also able to appease the concerns many bloggers share over grey sites.
A Proposed Solution
When I first looked at the problem, I was tempted to set guidelines by which a blogger should not get more than X percent of their overall content from other sites or use more than Y lines from another entry. All ideas along those lines, however, quickly fell through.
First, some sites like Engadget, gets a majority of their information from other sources and, correctly, have never been accused of content theft. (Correction: Engadget does write their own copy but reuses many photographs. I apologize for the misunderstanding.). Second, given the varied lengths of posts and methods of reuse available, almost any guideline system would quickly run afoul of fair use and, in other cases, would permit reuse that would almost certainly be questionable. Any attempt to work around these factors would complicate a rule that, supposedly, had the sole benefit of being simple.
In lieu of a hard and fast rule, much like the fair use provision itself, we begin to seek out a framework for determining if a reuse is ethical or not. This framework would contain the following elements, many of which are found in the standard fair use provision:
- The amount of reused content compared to the amount of original content.
- The amount of reused content in relation to the original work.
- The frequency with which large blocks of text are used.
- What is gained by the original author.
- Whether permission was granted in advance, either through a CC license or direct permission.
- Whether attribution was provided or not.
- Other indications as to the intent of the one reusing the work, including excessive advertisements, links to one’s own sites and other forms of profiteering or over the top promotion.
(Note: As with everything I do like this, these elements are a draft and are open to both comment and revision.)
Such a system, while not perfect or easy, would provide guidelines both for pursuing content theft and reusing others works. Though it might be subjective in many respects, it does give people pause to think about what they are doing beforehand and at least some standard of conduct to follow.
With file sharing, blogging and content trading are more popular than ever, copyright has become something of a dirty word. Many people are obsessed not with how to best disperse information and participate in this sharing revolution, but with how much they can get away with legally and ethically.
In a parallel to the famous John F. Kennedy quote, we need to stop asking what others can do for us, and ask what we can do for them. Rather than simply wondering what we can get away with or how we can get the most for the least amount of work, we need to figure out how we can best participate in this world-wide discussion.
If the ethics of the blogging world are constantly abused to promote the gain of others, high quality writers will have little motivation to post their works on-line and, as the well slowly dries up, there will be less and less work available for either reuse or for simply reading.
It’s not enough to share, we have to support and reward good content creators. It’s the only way to keep the revolution alive.
Since this article made its appearance on Slashdot, many people have criticized me for allegedly mixing up the terms plagiarism and copyright infringement. This is coming from confusion in dealing with both the title and the first paragraph of this piece, which were both intended to be hat tips to the articles that inspired me to write about this issue.
The quote is attributed in the very first sentence of the piece. I chose to put quotes around the word "New" instead of the entire title because this kind of content reuse has been going on for some time. There really is little "new" about it. I have modified the title to make it more clear.
Throughout the work I use the terms copyright infringement, reuse and content theft, but never the word plagiarism after the first paragraph. I understand the difference between the terms well and need no lectures.
My hope is that this piece and the attention drawn to it will spark real discussion on a very complicated and intricate issue. Instead, I fear that confusion and misinterpretation may prevent a much-needed debate.
I hope that bloggers, in their haste to chop down the work, will look past the poorly-worded intro and into the issue behind the work, the reason it was pushed in the first place.
[tags]Plagiarism, Content Theft, Copyright Infringement, Copyright Law, Scraping, Creative Commons[/tags]