Study Chronicles DMCA Abuses

By Jonathan Bailey • Nov 23rd, 2005 • Category: Articles, DMCA, Legal Issues, News

Jennifer Urban of USC’s Intellectual Property Legal Clinic and Laura Quilter of the Samuelson Clinic at the University of California, Berkeley, released a summary report of a study they’ve been working on regarding DMCA takedown notices. The study, which is due out in March 2006, finds that some 30% of all DMCA notices are flawed and, potentially, illegal.

While the study’s results are shocking to many who were not aware of the adverse impact the DMCA has had since it’s passage in 1998, many who have studied copyright law, myself included, have found it to be a reaffirmation of what they already knew.

However, flaws in the study call many of its results into question and many of the conclusions drawn in the matter are questionable at best. So, though already heralded by copyleft and civil liberty advocates, the impact the study will have is very much in the air.

Background

Section 512 of the DMCA, where the notice and takedown provision is listed, was part of a compromise between copyright advocates and online service providers. In short, providers could not be held liable for copyright infringement on their services if they met certain requirements, one of the key ones being the expeditious removal of infringing works from their servers.

The law was cheered by providers and copyright holders alike. Copyright holders gained a quick and easy means of shutting down infringement on the Web, and providers were removed from the threat of copyright infringement suits and not burdened with scanning their services for infringing materials in the process. However, many began to wonder if the law would be abused, mainly to stifle free speech or legitimate reuse of work.

Urban’s study aimed to determine exactly what the rate of abuse was and reached some startling conclusions.

Findings

The study’ s key findings can be summerized in these paragraphs taken from the second page of the report:

Thirty percent of notices demanded takedown for claims that presented an obvious question for a court (a clear fair use argument, complaints about uncopyrightable material, and the like);

Notices to traditional ISP’s included a substantial number of demands to remove files from peer-to-peer networks (which are not actually covered under the takedown statute, and which an OSP can only honor by terminating the target’s Internet access entirely); and

One out of 11 included significant statutory flaws that render the notice unusable (for example, failing to adequately identify infringing material).

In addition, we found some interesting patterns that do not, by themselves, indicate concern, but which are of concern when combined with the fact that one third of the notices depended on questionable claims:

Over half—57%—of notices sent to Google to demand removal of links in the index were sent by businesses targeting apparent competitors;

Over a third—37%—of the notices sent to Google targeted sites apparently outside the United States.

The study goes on to find that, on average, the number of takedown notices being filed is increasing over time, with the growth in notices to search engines outstripping other kinds. It also found that the vast majority of DMCA notices are sent in by corporations and businesses, but not the movie and music industries, who lobbied so hard for the law. Instead, according to the study, they send the bulk of 512(a) notices where takedown is neither possible nor required, for example when works are distributed over file sharing networks.

Finally, the study finds that the counter-notice, how someone who has been wrongfully targeted by a DMCA notice can get their material restores, is almost never used. They found only seven notices in their sample and almost no instances of put-back can be found.

In short, the study itself is an excellent read and, if you haven’t done so, you should take a look at it now, it has a wide variety of statistics about who is filing notices, who is targeted, what they are for and, often times, what is wrong with them.

Flaws

However, the study has several flaws, most of which are discussed in the summary, which are most likely affecting the results and the credibility of them.

First, the sample size for the study, approximately 900 notices, is far too small. The hundreds, if not thousands, of notices being filed per week, 900 over the course of three years does not provide a large sample and maybe prone to bias.

Second, the study only sampled from ChillingEffects.org. The site, which collects DMCA notices and C&D letters, gets the vast majority of its content from Google, who posts all of their notices on there, and from user submissions. As the report put it, such a sample, “may skew towards the substantially flawed” since the majority of Google’s notices deal with the search engine, not the various hosting services it offers, and, thus, is more prone to abuse. Also, flawed notices, both legally and technically, are more likely to be submitted to chillingeffects for both review and documentation.

Finally, the study makes no attempt to distinguish between accidental misuse of the law and intentional ones. Though the study does state that both targets and senders often have limited understanding of copyright law, there’s no statistics on the issue. Though this would be something very difficult to study, it’s a crucial piece of the puzzle when trying to understand how the DMCA is being used.

The researchers, however have promised to gather further data, in particular from large hosts such as ThePlanet and attempt to gain a larger, less biased sample. Until then though, the findings of the study, though interesting, should be taken with healthy skepticism, if for no other reason than the practice of good research.

Personally

I personally had one DMCA notice included in the sample. Since it was one I faxed, I can’t say for certain if it was likely to be included in the 11% with statutory flaws (it used an old template and may not have had my full address). (Note: I’ve checked my information and it appears to have been complete) However, I can say for certain that the claim itself was legitimate and within the bounds of copyright law (it was, predictably enough, a plagiarism issue).

Though it’s exciting to be a part of a major study on the DMCA, I find it very hard to take the results at face value. Even though the results of the study mirror my own conversations with Web hosts and others in the field, the potential for skew is just too great to ignore.

However, I think the most important thing is that this study not be used to attack everyone who uses the DMCA. If 30% of all DMCA notices have legal problems then 70% don’t. This means that, for every misuse (accidental or intentional) of the DMCA, there have been two legitimate (or positive) uses of it.

It would be very harsh for copyleft advocates, many of whom who support file sharing where only 10% of the traffic is legal, to blindly condemn everyone who uses DMCA notices, even those who do so begrudgingly and legally, when nearly 70% of such notices are within the bounds of the law.

I’m no fan of the DMCA and I use it only because I have to. Still, I don’t want to see small copyright holders trying to protect their work from plagiarism be condemned simply because a minority have abused the same system.

Suggestions

The last line of the subject said that it will discuss proposals for change in the full paper, due out in four months. In the meantime though, I would like to offer a few of my suggestions on the matter.

Note: These suggestions are ideas for working within the bounds of the DMCA. The simplest solution, of course, is to simply repeal it and start over.

Eliminate 512(d): 512(d), which requires search engines to remove links to infringing material. It is the most commonly abused since it can cripple a site’s ability to promote itself. Though useful in instances where an international host is uncooperative, its scope of usefulness is far outstripped by its scope for abuse. When you can get someone removed from the search engines without notification, warning or recourse, there’s just too much potential for misuse. Search engines do not host content and should not be the copyright police of the Web. It’s an unfair burden with too many risks.

Offer Damages for False DMCA Notices: Filing a false DMCA notice is already an act of perjury, however, the law specifies no damages that can be obtained if one is hit with a malicious notice. Though the likelihood of someone suing for such an offense is slim, it adds an extra deterrent.

Make DMCA Notices Public: The same as providers are required to submit information to the copyright office regarding their designated agent, they should also send in DMCA notices that they act upon. A public library of DMCA notices would not only make future research easier, but also eliminate one of the biggest complaints of the current system, secrecy.

Give Providers More Leeway: Under the previous system, when copyright holders had to work with abuse teams and not deal with the DMCA, providers investigated claims of copyright infringement and threw out those that were clearly baseless. The DMCA should offer a similar provision. Though no one should expect hosts to make determinations about fair use or other complicated copyright questions, the fact that the host can not say no, even when the claim is outrageous, is very troubling.

In the End

The report is very interesting and has some great information, however, there’s no real meat for me to sink my teeth into. The flaws in the report only seem to belittle conclusions that I have observed. These aren’t the papers that I want to wave around while demanding that Congress reconsider the DMCA and this isn’t the report that’s going to change everything.

Still, that might be coming and we’ll have these two researchers to thank.

[tags]Plagiarism, DMCA, Copyright Law, Copyleft, Copyright, Content Theft[/tags]

Short URL to this Post: http://copybyte.com/z/5a

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

  • Jim
    I am looking for confirmation from the authors as to whether their sample includes all Google notices and how that has been verified with Google. Just based on the numbers, it seems unlikely (maybe impossible) that this is Google's entire set of DMCA notices over the three year period. For example, how could Google have such a low number compared to The Planet?

    This is a critical point, and it would be much appreciated if the authors could address it directly and verify the source for this.

    Thanks
  • Jennifer,

    First off, thank you very much for your thoughtful reply. I'm glad that you were able to clarify a few points and you answered several questions that I had on the study.

    My main problem with the sample being so Google heavy is that, in my personal experience, Google is a target for misguided DMCA notices. In fact, I warn heavily on my site against sending notices to Google for just that reason. In short, if I were going to use the DMCA maliciously to attack a competitor or a detractor, I would do it by filing a notice through Google, not their host and, since Google is so big, I wouldn't waste time with any other search engines. 512(d) is just too ripe for abuse and Google is the main target for such abuse.

    Also, Google's main hosting service, Blogger, is more prone than most hosts to "hostile" DMCA notices. Blogging is one area where fair use copying, commentary, strong opinions and corporate enemies make it a ripe minefield for DMCA abuses. I would imagine that traditional hosts, like ThePlanet, will probably see fewer such notices.

    I do agree that the main skew comes from the voluntarily submitted collection, however, there are elements of Google's business that make it a more attractive target for malicious and erroneous notices.

    I'm going to send you an email in a second regarding the coding. I have a few questions that I want to be off the record as I don't want to inadvertently mess with your carefully crafted system. I know how hard researchers work to develop coding techniques (having done journalism research projects in the past).

    I think it'll be interesting though to see if Google's results stack up to other hosts or if there is a skew. Honestly, I'm guessing based upon my own experience dealing with various hosts. One thing research has taught me is that anecdotal evidence can be wrong.

    I hope that you have a happy Thanksgiving!
  • Jennifer Urban
    Thanks for the very thoughtful comments!

    First, I just wanted to note that your presentation of the flaws misses just a bit and to clarify that. Because Google sends everything, the Google notices don't skew as unreliable within the Google sample. It's just that Google is only one, albeit important, company, and that it provides primarily search services. (Though it provides more hosting all the time and, in fact, 1/3 of the notices from Google we studied are hosting notices.) Comparison among companies is needed, here, so we weren't able to say much about notices outside our sample.

    The skew you noted actually affects the other --non-Google-- notices. This is because the vast majority of these (by the way a much smaller number--Google provides 84% of the sample) are self-reported to Chilling Effects. (The Internet Archive submits notices as well as Google, but not enough to affect the sample.) We note that people who submit notices are more likely to at least think they're in the right. So, we use these very, very carefully, and draw no general conclusions from them. We do speculate about 512(a), but that's backed up with confidential information from other ISPs.

    On the coding, I will note that we were very careful when we put a notice into the "flawed" category--we did it only in situations where there was a clear defense or other problem. For example, we didn't code for fair use based only on using just a little bit, though that might be fair use. We stuck to parody, commentary, etc--the favored uses--and to clearcut issue with copyrightability. Additionally, we couldn't always find out enough information about a notice to decide, so those were not counted as flawed. Sadly, I fear we may have undercounted flaws. So, it's not that at least 70% are okay, it's that at least 30% are questionable in our sample.

    The main issue with studying 512 notices, of course, is that they are private--it's been quite difficult to obtain information from OSPs. If we had a randomized sample (and actually 900 or so would probably be fine if randomized, statistically), we'd know more. Still, the fact that we have very good data from at least one company does mean that we were able to make a start.

    I want to especially thank you for the thoughtful ideas on reform. We've kicked about all but the last one, I think.
  • Jim
    This is an important topic, but it's hard to believe that anyone with a sense of scientific discipline would even call it a "study" much less the basis for any policymaking.
blog comments powered by Disqus