Review: The Plagiarism Checker

Jonathan BaileyDecember 16, 2008

5 minutes read

Late last week, a post reached the front page of Reddit that piqued the curiosity of copyright holders, teachers and professors alike. It was about a service called “The Plagiarism Checker” (dubbed by me the “Dustball” checker due to its domain), created by Brian Klug in 2002, when he was a student at the University of Maryland at College Park, and abandoned until recently this year.

The site, according to Klug, was getting about 2,000 visits per day when it was forgotten but is almost certainly doing much better now as it has taken off, attracting countless Twitter Tweets and other social news attention. Librarians and teachers are especially captivated by this site.

But is “The Plagiarism Checker” worth using? Is it as powerful of a tool as some, although not the site itself, have made it to be? The sad answer is no, but it could, with a few simple tweaks, become a much more useful service for teachers and bloggers alike.

How it Works

The basic premise of the minimalist site can be summed up by its instructions:

Cut & paste your students paper or homework assignment into the box below, and click the “check” button. This free plagiarism detector will find plagiarized text in homework and other essays/reports.

In short, you take an essay, article or other lengthy prose work, paste it into a textbox and hit “check”. From there, the site extracts several strings of text, runs them through Google and compiles the result, determining whether plagiarism is probable.

In that regard, the idea is actually very similar to Copyscape, which also uses Google via their API, to process results. However, where Copyscape’s keeps the “magic” hidden from the user, the “Dustball” plagiarism checker includes links to the Google results, encouraging users to click through and research the case for themselves.

That alone is a big part of the problem Webmasters, and many teachers, will have with the service. Where Copyscape, as well as academic tools such as TurnItIn, provide very simple and colorful results, The Plagiarism Checker is a very bare-bones approach, requiring the user to perform a large amount of research on their own.

Still, a bit of research will be welcomed if the service produces great results, unfortunately, it seems that the service performs only lukewarm, at best.

My Tests

To test the service, I decided to run it through a similar battery of tests that I had run Copyscape through and then watched as they improved upon the initial results.

The first test was to run an old poem of mine through the system, one that allegedly has over 300 matches in Google. However, that test was thwarted as The Plagiarism Checker refused to even look at the work, saying that it could not function with such short text strings.

I then shifted gears and started using prose works, the first being one that had 36 matches in Google at the time I did the search. The result was stunning.

Despite the fact Google had reported three dozen matches on test snippets from the work itself, the “Dustball” checker was unable to find anything. To make matters worse, using some of the sample quotes from the test, I was able to locate other copies of the work, such as with the first quote.

Clearly, The Plagiarism Checker was missing results that Google was finding, meaning it was discarding them for whatever reason.

A similar test for another prose work only returned one sentence that was matched against anything and the results for it were all false positives. This work, in Google, has six results.

The only search using the service that seemed to work remotely well was when I ran the Declaration of Independence through it. Every search term, in this test, came back positive.

It appears that text that is not widely distributed around the Web may or may not show up as plagiarized in this work, something that has me very worried as many are starting to rely on this plagiarism checker as their main tool for detecting both copyright infringement and the plagiarism of students.

The Sad Truth

Simply put, any and all of these search results should have come back as being plagiarized. Even if there were no other matches of the content, these works existed on my site and are available through Google there. There is no reason that any of these works should have come back as anything short of 100% plagiarized since this site can not know I was the one submitting them.

For teachers, this is not good news. Is a student plagiarizes material from obscure sources, they are likely to escape detection. Likewise, Webmasters and those that might want to use this tool to track their own content, will likely be disappointed that it doesn’t seem to pick up when the infringement is only a few dozen sites.

This can most likely be fixed through tweaks in the algorithm, but as it sits right now, it doesn’t appear that it has much to offer teachers or Webmasters, especially when Copyscape is relatively effective and cheap to use.

Simply put, at this moment, Copyscape is easier, more effective and faster than The Plagiarism Checker and, at only five cents a search, is affordable too.

However, the best technique still appears to be taking the time to select good phrases from a work and manually searching for those. It returns the most results and seems to work well nearly all of the time.

The Big Picture

My issue with The Plagiarism Checker has less to do with the service itself and more to do with how others have been promoting it. The site itself is actually fairly humble about what it can do, but bloggers and Twitter users have been advertising it as if it were a silver bullet to detect plagiarism. Clearly, that is not the case.

With a few tweaks and fixes to the algorithm, I don’t doubt that this service, much like Copyscape, could become a very powerful tool. However, even if the results were on par with Copyscape, the latter remains faster and easier to use, meaning that there will not be much reason to use the “Dustball” checker.

To make matters worse, most teachers and professors have access to services such as TurnItIn that are far more accurate and covers a much larger breadth of sources than “The Plagiarism Checker”. Considering the ease of us and added features, there is not much that can be gleaned from a Google-only search, that can’t be gleaned from the more automated service (Though Copyscape did top Turnitin in a recent plagiarism detection study).

In short, I don’t see much usefulness for this tool, even if its accuracy improves, and I and more than a little confused as to why so many seem to have promoted it so heavily.

Conclusions

More than anything, this is a case against the reliance on any one plagiarism checking service. Even the best services will let results slip through the cracks. Furthermore, just because a service is popular does not mean that it should be trusted above all.

However, I find it very difficult to fault The Plagiarism Checker for this confusion and these problems. It is clear that the service was as much an experiment as anything, it is promoted humbly and was actually abandoned for approximately six years. It was others, perhaps desperate for some way to more effectively detect plagiarism, that gave it an unjustified reputation.

If anything, this case shows the need and the potential market for such services and illustrates why some companies have made millions in this field. People are eager for a solution and are excited by any promise of one.

Sadly though, this site is not the one people are looking for.