Punditry: Turn This In

If you have a Web site, there’s a good chance that your work is somewhere in the Turnitin database. In fact, according to Turnitin , even if you had a Web page and removed it sometime later, there’s a good chance your now-defunct site lives on in their files.

While some see this as harmless, doing little more than what Google , Yahoo and the Web Archive do every day, others see it as something more sinister. Specifically, many are concerned that iParadigms , the maker of Turnitin, are making a hefty profit by using other’s copyrighted works while offering little, if anything, in return.

The answer, however, is not easy or direct. But rather, muddled by the world of copyright law and accepted practices.

Parasites vs. Symbionts

As a general rule, barring any Creative Commons License or other kind of direct permission, it is legal to use someone’s copyrighted works when the good of the public far outweighs the harm to the copyright holder. Educational uses, criticism and parody uses have all been upheld so long as a minimal amount is used and the work is attributed. 

Commonly accepted practices, on the other hand, allow extra uses when the benefit to the copyright holder far outweigh that which is gained by the person using the work. For example, many feel that Google is infringing on copyright law by caching pages for users to search. They are, after all, profiting from other people’s work without explicit permission. However, most feel that Google does more good to copyright holders by driving traffic to them than they gain from the reuse. Furthermore, Google offers the ability for Webmasters to remove themselves from the Google database, including archived copies.

Thus, the general practice becomes that, one can generally reuse a copyrighted work if the use is symbiotic (both parties benefiting) rather than parasitic (the person using get all or almost all of the benefit). You can reuse a work, but the copyright holder gets to ask the question "What’s in it for me?"

And figuring out what’s in it for the copyright holder when it comes to Turnitin is a very difficult question indeed.

The Google Question

With Google, and most search engines, the benefit to the copyright holder is obvious. The search engine gets to be the go-to place to find anything on the Web and the site owner gets tons of free, targeted traffic sent its way. With Turnitin, the benefit is a lot less clear.

The average Webmaster will, most likely, never see any traffic from Turnitin. A professor or instructor running papers through the service is very unlikely to turn up your site, unless you are a paper mill, and isn’t very likely to be very interested in what you have to say if they do.

On the other hand, Turnitin does offer the extra benefit of protecting the sites in its database against plagiarism. Theoretically, any site held within the Turnitin database can not be plagiarized by anyone at any institution using the service.  However, academic plagiarists, the area Turnitin specializes in, have the least amount of economic impact on its victims of any kind of plagiarists. On the other hand, as Turnitin makes its way into the journalism arena, that may change.

Nonetheless, it seems that Turnitin might have a bigger copyright problems than Webmasters angry about caching. Their very clientele seems to be taking exception to their practices.

Students Fight Back

Mike Smit is a graduate student and teaching assistant at Dalhousie University in Halifax, Nova Scotia. Uneasy with the principle of Turnitin, he set up at a test account with the service and began performing experiments. He was shocked to find out that, despite Turnitin’s claims to the contrary (pdf), that they do retain copies of student essays and send them out without the student’s permission or knowledge. 

To compound the problem, Mike tried to get his Web site removed from the Turnitin database, both in future crawls and the Turnitin cache. However, when he discovered that there was no set system to have his work removed, he embarked on an email exchange that lasted over 100 days. This is in sharp contrast to Google, which has a page and several tools dedicated to helping Webmasters remove their work from their database, including archived copies.

Much of this seems to separate Turnitin from the Blake Field v. Google case that Turnitin uses to justify its caching. The inability to opt out, the lack of clear benefit to copyright holders and the direct profit Turnitin receives from its database have left many wondering about the legal and ethical implications of Turnitin.

Caching and Turnitin

Courts have, generally, held that caching Web pages, or storing Web pages at another location, is fair use. Web browsers cache Web pages to speed up surfing, Google caches pages to both enable users to search through their content and in case the linked URL ceases to work or change, and even some ISPs cache pages to reduce network traffic.

Most caches, however, are designed to help users access the content that’s cached and help the Webmasters the cache. Google’s cache enables people to find a work, an ISPs cache not only speeds up access to the work for the user but also reduces the bandwidth costs for the site owner and a browser cache does the same thing.

However, if a Webmaster has a problem with caching of their content, it’s usually trivial to prevent it.  As we learned before, that’s not so with Turnitin. This is trouble because the ability to opt out, along with the overwhelming benefits to the copyright holder, have, according to my reading, been two of the legal cornerstones that have kept caching legal.

But even if the caching is legal, there are ethical implications as well, especially when an organization vowing to fight plagiarism, which is essentially a form copyright infringement, doesn’t give copyright holders the same courtesy that a search engine does. If one combines that the with the aforementioned misleading statements, even if Turnitin is a legal organization, they certainly have lost their moral compass.

What to Do

The solution to the problem is actually quite simple. Turnitin, if nothing else, should offer an opt-out feature for both Webmasters, students and others that had no say in whether or not we wound up in the database. Even though nearly all of us will have a robots.txt file that allows the Turnitin crawler, many, if not most, had no idea it even existed.

The problem with making opting out easy, for a plagiarism database at least, is pretty straightforward. If anyone can exclude themselves freely, what’s to stop essay sites from simply opting out and making their service almost useless? On the other hand, if no one can, they might find themselves on the wrong side of a copyright infringement lawsuit.

In the end, the only reason that Turnitin has avoided more scrutiny from Webmasters is because it’s such an unknown, It’s been quietly collecting data for years now and only a few have been any the wiser.

That, however, is coming to an end. As more Webmasters get savvy to copyright and to Turnitin, it’s only a matter of time before questions and challenges are raised. Turnitin believes that it’s within the bounds of copyright law, odds are, we’ll find out what the courts have to say soon enough.

[tags]Plagiarism, Content Theft, Turnitin, Copyright Infringement, Caching, Law, Internet Explorer[/tags]