Wikipedia, Yahoo Answers Tops for Academic Copying

Jonathan BaileyNovember 3, 2011

4 minutes read

Secondary Education Copying Earlier today, the people at Turnitin, which is owned by iParadigms, released a report detailing where students are copying their content from on the Web and what the differences are between those in higher education and secondary schools (high school and college prep schools).

Turnitin, which currently processes over 40 million papers per year, has a unique birds-eye view of the situation and its statistics, though somewhat limited in their application, are interesting and may be useful to at least some degree.

It is important to note, as Turnitin does, that they do not actually detect plagiarism and instead, only track copied content. It is up to a human to determine what is plagiarism and what is legitimately cited. Still, it’s clear from this report that many of the sources are plagiarisms as there is almost no reason or opportunity to use them legitimately in an academic environment.

Also, it’s worth noting that the study only looks at content copied from the Web and makes no mention of Turnitin’s other databases, including the ones it has of submitted papers and various academic journals that are not online.

So what did the study find? The results were interesting, but perhaps not very surprising

How the Study Was Performed

The study looked at some 33.5 million papers (including some 9 million secondary papers and nearly 24.5 million higher education paper) that were submitted between June 2010 and June 2011. In those papers, it found a total of 128 million content matches, just shy of an average of 4 matches per paper (Reminder: Matches are not necessarily plagiarisms, just verbatim copies, and they can be almost any length from very short to almost the whole paper.)

The study then traced those matches back to their sources. Those sources were then categorized into six categories, Social Networking/Content Sharing, Homework/Academic, Cheat Sites/Paper Mills, News Sites/Portals, Encyclopedias and “Others”.

The results were then tallied, along with information about which sites were the most popular, and were compiled into the infographic below.

What the Study Found

To be honest, there weren’t many surprises in the study. The top site, predictably, was Wikipedia, with 8% of all secondary and nearly 11% of all higher education matches coming from that one site. Yahoo! Answers was second in both lists, with nearly 8% of secondary matches and nearly 4% of all higher education matches.

In fact, of the top ten sites, 8 were found on both (Wikipedia, Yahoo! Answers, eNotes, Slideshare, Scribd, Oppapers and Amazon all being in the top ten on both lists). Secondary papers also turned to Essaymania and 123HelpMe, where high education students turned more toward CourseHero and Justanswers.

Since many schools have expressly banned the use of Wikipedia for academic research, at least as a source, it seems likely that many, if not most, of those copies are plagiarized. The same holds true for Yahoo! Answers as it is not a site one would normally use as a source in an academic paper.

One thing that was striking was that higher education students seemed to use a wider variety of sources, pulling more from news and portal sites, as well as from cheat sites and paper mills, than those in secondary schools. However, secondary students pulled three times as much from “other” sites, which included many review sites that were popular.

My Take

Personally, I didn’t find a great deal surprising about this study as it matched up well with what I’ve heard anecdotally talking with teachers and others in the industry. The biggest surprise was to see that higher education students are MORE dependent on Wikipedia than their secondary school peers, but they also showed a greater overall variety in the types of sources they copied from.

I likely would have guessed Wikipedia and Yahoo! Answers on my own as 1 and 2, but I was also a bit surprised to see Amazon, Slideshare and Answers.com ranking as high as they did.

As for the data itself, to me it shows that students, by in large, are not being very creative about where they get their content from. The top 10 sites account for 31% of all matches on secondary students and over 35% of all matches for higher education students. All of the sites involved are heavily indexed and easily searched for in Google, meaning that, while Turnitin can help find matches and makes it easier to do so, at least when dealing with the Web matches teachers relying on Google are likely doing fairly well too.

(Reminder: The study did not look at matches on Turnitin’s internal database and archive, just the service’s archive of the Web.)

Still, the study does show that students are pulling content from a variety of sources, including many flagrantly illegitimate ones such as essay mills and cheating sites. As such, this study hammers home the importance of dealing with plagiarism proactively, something that the report strongly suggests.

Bottom Line

All in all, the study shows what types of sites that teachers should be expecting to find their students copied from, both legitimately and when plagiarizing. Of course, every classroom is going to be different, but this 10,000 ft view of content use can provide at least some information useful to those “in the trenches”.

Though I don’t think many educators will be surprised by these results, they are still interesting and still useful, making them something every educator should be aware of, especially as copying content from the Web becomes more and more common.