Review: The Plagiarism Checker

plagiarism-checker-logoLate last week, a post reached the front page of Reddit that piqued the curiosity of copyright holders, teachers and professors alike. It was about a service called “The Plagiarism Checker” (dubbed by me the “Dustball” checker due to its domain), created by Brian Klug in 2002, when he was a student at the University of Maryland at College Park, and abandoned until recently this year.

The site, according to Klug, was getting about 2,000 visits per day when it was forgotten but is almost certainly doing much better now as it has taken off, attracting countless Twitter Tweets and other social news attention. Librarians and teachers are especially captivated by this site.

But is “The Plagiarism Checker” worth using? Is it as powerful of a tool as some, although not the site itself, have made it to be? The sad answer is no, but it could, with a few simple tweaks, become a much more useful service for teachers and bloggers alike.

How it Works

The basic premise of the minimalist site can be summed up by its instructions:

Cut & paste your students paper or homework assignment into the box below, and click the “check” button. This free plagiarism detector will find plagiarized text in homework and other essays/reports.

In short, you take an essay, article or other lengthy prose work, paste it into a textbox and hit “check”. From there, the site extracts several strings of text, runs them through Google and compiles the result, determining whether plagiarism is probable.

In that regard, the idea is actually very similar to Copyscape, which also uses Google via their API, to process results. However, where Copyscape’s keeps the “magic” hidden from the user, the “Dustball” plagiarism checker includes links to the Google results, encouraging users to click through and research the case for themselves.

That alone is a big part of the problem Webmasters, and many teachers, will have with the service. Where Copyscape, as well as academic tools such as TurnItIn, provide very simple and colorful results, The Plagiarism Checker is a very bare-bones approach, requiring the user to perform a large amount of research on their own.

Still, a bit of research will be welcomed if the service produces great results, unfortunately, it seems that the service performs only lukewarm, at best.

My Tests

To test the service, I decided to run it through a similar battery of tests that I had run Copyscape through and then watched as they improved upon the initial results.

The first test was to run an old poem of mine through the system, one that allegedly has over 300 matches in Google. However, that test was thwarted as The Plagiarism Checker refused to even look at the work, saying that it could not function with such short text strings.

plagiarism-checker-error

I then shifted gears and started using prose works, the first being one that had 36 matches in Google at the time I did the search. The result was stunning.

plagiarism-checker-none-found

Despite the fact Google had reported three dozen matches on test snippets from the work itself, the “Dustball” checker was unable to find anything. To make matters worse, using some of the sample quotes from the test, I was able to locate other copies of the work, such as with the first quote.

Clearly, The Plagiarism Checker was missing results that Google was finding, meaning it was discarding them for whatever reason.

A similar test for another prose work only returned one sentence that was matched against anything and the results for it were all false positives. This work, in Google, has six results.

plagiarism-checker-none-found4

The only search using the service that seemed to work remotely well was when I ran the Declaration of Independence through it. Every search term, in this test, came back positive.

plagiarism-found

It appears that text that is not widely distributed around the Web may or may not show up as plagiarized in this work, something that has me very worried as many are starting to rely on this plagiarism checker as their main tool for detecting both copyright infringement and the plagiarism of students.

The Sad Truth

Simply put, any and all of these search results should have come back as being plagiarized. Even if there were no other matches of the content, these works existed on my site and are available through Google there. There is no reason that any of these works should have come back as anything short of 100% plagiarized since this site can not know I was the one submitting them.

For teachers, this is not good news. Is a student plagiarizes material from obscure sources, they are likely to escape detection. Likewise, Webmasters and those that might want to use this tool to track their own content, will likely be disappointed that it doesn’t seem to pick up when the infringement is only a few dozen sites.

This can most likely be fixed through tweaks in the algorithm, but as it sits right now, it doesn’t appear that it has much to offer teachers or Webmasters, especially when Copyscape is relatively effective and cheap to use.

Simply put, at this moment, Copyscape is easier, more effective and faster than The Plagiarism Checker and, at only five cents a search, is affordable too.

However, the best technique still appears to be taking the time to select good phrases from a work and manually searching for those. It returns the most results and seems to work well nearly all of the time.

The Big Picture

My issue with The Plagiarism Checker has less to do with the service itself and more to do with how others have been promoting it. The site itself is actually fairly humble about what it can do, but bloggers and Twitter users have been advertising it as if it were a silver bullet to detect plagiarism. Clearly, that is not the case.

With a few tweaks and fixes to the algorithm, I don’t doubt that this service, much like Copyscape, could become a very powerful tool. However, even if the results were on par with Copyscape, the latter remains faster and easier to use, meaning that there will not be much reason to use the “Dustball” checker.

To make matters worse, most teachers and professors have access to services such as TurnItIn that are far more accurate and covers a much larger breadth of sources than “The Plagiarism Checker”. Considering the ease of us and added features, there is not much that can be gleaned from a Google-only search, that can’t be gleaned from the more automated service (Though Copyscape did top Turnitin in a recent plagiarism detection study).

In short, I don’t see much usefulness for this tool, even if its accuracy improves, and I and more than a little confused as to why so many seem to have promoted it so heavily.

Conclusions

More than anything, this is a case against the reliance on any one plagiarism checking service. Even the best services will let results slip through the cracks. Furthermore, just because a service is popular does not mean that it should be trusted above all.

However, I find it very difficult to fault The Plagiarism Checker for this confusion and these problems. It is clear that the service was as much an experiment as anything, it is promoted humbly and was actually abandoned for approximately six years. It was others, perhaps desperate for some way to more effectively detect plagiarism, that gave it an unjustified reputation.

If anything, this case shows the need and the potential market for such services and illustrates why some companies have made millions in this field. People are eager for a solution and are excited by any promise of one.

Sadly though, this site is not the one people are looking for.

27 Responses to Review: The Plagiarism Checker

  1. Agreed completely on all fronts. I can see what the creator was trying to do but between the lackluster matching and the poor workflow, there isn't much here for anyone to sink their teeth into.

  2. I had suspected it from the moment I laid eyes on it but still wanted to test it out thoroughly before I tossed it aside, I'd hate to pass over the greatest plagiarism checking tool just because of an intuition. Sadly, it appears I saw right.

  3. Hi Jonathan,

    just ran a few of our tests through. A 100% plagiarism raised two concerns (both correct), but only because this 52 or so letter slice did not contain any Umlauts….

    When you do get a hit, you still have to leaf through the possible source to look for the place the plagiarism was taken from. Copyscape does a nicer job of markup.

    It does not find multiple sources and misses some of the more esoteric ones completely. Not a good workflow fit, and searching with Google is faster.

  4. Hi Jonathan,

    just ran a few of our tests through. A 100% plagiarism raised two concerns (both correct), but only because this 52 or so letter slice did not contain any Umlauts….

    When you do get a hit, you still have to leaf through the possible source to look for the place the plagiarism was taken from. Copyscape does a nicer job of markup.

    It does not find multiple sources and misses some of the more esoteric ones completely. Not a good workflow fit, and searching with Google is faster.

    • Agreed completely on all fronts. I can see what the creator was trying to do but between the lackluster matching and the poor workflow, there isn't much here for anyone to sink their teeth into.

  5. Amy says:

    I came to the same conclusions when I sampled it. Thank you for this detailed write-up. I am always looking for great plagiarism tools that don’t violate FERPA (Turnitin, IMHO).

    • I had suspected it from the moment I laid eyes on it but still wanted to test it out thoroughly before I tossed it aside, I'd hate to pass over the greatest plagiarism checking tool just because of an intuition. Sadly, it appears I saw right.

  6. Roman says:

    I applied for copyright certificates for 3 of my works (greeting cards). The first work was submitted to US Copyright Office on December 10, 2007. The second & third were submitted at the end of May, 2008. It normally takes 4 months to issue a copyright certificate. However, I still haven't gotten any responses. What do I do? Any suggestions? Is this the way a Federal Agency supposed to operate? I mean, just because the office receives 10,000 works (in one day) from copyright applicants, does not grant the office the right to mess me up! Does it? Anyone else have a similar situation?

  7. Roman says:

    I applied for copyright certificates for 3 of my works (greeting cards). The first work was submitted to US Copyright Office on December 10, 2007. The second & third were submitted at the end of May, 2008. It normally takes 4 months to issue a copyright certificate. However, I still haven't gotten any responses. What do I do? Any suggestions? Is this the way a Federal Agency supposed to operate? I mean, just because the office receives 10,000 works (in one day) from copyright applicants, does not grant the office the right to mess me up! Does it? Anyone else have a similar situation?

    • The best thing that I can tell you to do is to call them and ask them what is going on. They have their numbers here: http://www.copyright.gov/help/

      I'm the first to agree that the USCO is a bag of hurt and doesn't function even remotely as it should, but I sadly can't help shed any light on this matter. The few times I've done it I've only had one major issue and that was caused by their radiation equipment destroying CDs I sent in.

      My advice is to get in touch with them either via email or phone and see what they can do. They do respond, at least the few times I've contacted them.

  8. The best thing that I can tell you to do is to call them and ask them what is going on. They have their numbers here: http://www.copyright.gov/help/

    I'm the first to agree that the USCO is a bag of hurt and doesn't function even remotely as it should, but I sadly can't help shed any light on this matter. The few times I've done it I've only had one major issue and that was caused by their radiation equipment destroying CDs I sent in.

    My advice is to get in touch with them either via email or phone and see what they can do. They do respond, at least the few times I've contacted them.

  9. [...] plagiarism checkers using Google and other search engines. Most of these plagiarism checkers, such as the “Dustball” checker, fail to produce adequate [...]

  10. linda says:

    Puritan Pride

    In Nathaniel Hawthorne's “The Scarlett Letter”, he documents a time in history when the Puritanism doctrine was understood as a way of life for the Puritan religion. Although it was not agreed upon by all, many used and upheld this religion to guide them throughout their daily living. Hawthorne, growing up in the 19th Century, directs his attention to an earlier time in the Puritan religion, when men and women had a place to which he or she is to remain noble. This novel conveys multiple perspectives to describe a religion, a people, and a time when no considerations of humanity rose above religion. Psychologically, the historical and feminist views weigh in a degree of selfishness, followed by repentance, and forgiveness.

    Nathaniel Hawthorne born in 1804, in Salem Massachusetts to a Puritan religion, is the sixth generation of his family in Salem. As a young boy he suffers a leg injury, not able to move about, he occupies himself with reading and thought. Two sisters and the only son, he is an idol to the girls in his life, including his mother. Inspired by his uncles’ riches, the Mannings, he attends Bowdoin college in Brunswick, Maine in 1821. There he indulges in English composition and befriends Henry Longfellow, Franklin Pierce and Horatio Bridges, who remains friends throughout their lives. He returns to Salem in 1825 (Hawthorne, 2004). Living with his mother, he often felt loneliness and out of touch with where he felt he should be in life, but also at this time he learns to write tales and sketches. He marries Sophia Peabody in 1842, they moves to Concord, Massachusetts where their daughter Una is born. In 1846, Julian, his son, is born after moving back to Salem in 1845.His works consist of short stories to novels to which his first novel, Fanshawe, is secretly published at his expense in 1828 (Hawthorne, 2004). The novel most recognized and extensively analyzed for its interpretations is “The Scarlet Letter”. Hawthorne began writing this novel vigorously after he is dismissed from his job as a surveyor. Hawthorne's influences were of many, his Puritan background, and maybe the history of his ancestors of whom his great-great-grandfather (John Hathorne) served as a judge in the witch trials that condemns people to death who were believed to be witches. Researching his family's history, Hawthorne writes The Scarlett Letter from a personal perspective of the Puritan religion and their strict standings. He creates characters of every sense, good and bad, showing the human demoralization where sin breaks free but does not go without punishment. Because of his views on good and evil, his character Reverend Dimmesdale is one of his own views.

    The Scarlet Letter is about Hester Prynne, the protagonist, the novel is not so much a consideration of her innate character as it is an examination of the forces that shape her and the transformations those forces effect. There is not much known about Hester prior to her affair with Dimmesdale and her resultant public shaming. She marries Roger Chillingsworth, one whom she did not love, and the story never reveals why this marriage exists. Before her marriage, Hester was a strong-willed and spontaneous young woman, she remembers her parents as loving guides who frequently had to remind of her HeThe genius of Nathaniel Hawthorne is evident in “The Scarlet Letter”, to which he writes to examine the Puritan religion in the 17th century. This story is about sin, punishment, and forgiveness where one sin is greater than another. This novel conveys three perspectives in its characters and writer in a historical time when feminist views are few creating psychological interpretations

    Nathaniel Hawthorne, born in 1804

    Additional paragraphs.

  11. waqas77 says:

    The articles describes the Calendar are cyclical anomalies in returns, where the cycle is based on the calendar. The most important calendar anomalies are the January effect, the turn of the month effect and the weekend effect. The quality of studies finding evidence of different market anomalies are too overwhelming to simply ignore and just write off as temporary miss pricings according to efficient market theory.The analysis has been conducted to evaluate that investor have to be critical not to over interpret results with the risk of neglecting and under estimating the importance of such a basic concept as portfolio diversification. This journal shows the significance of calendar trading rules is much weaker when it is assessed in the context of global rules that could reasonably have been evaluated. Evidence provided that daily abnormal returns in January have large means relative to remaining eleven month and small firms experience large returns in January and exceptionally large returns during the first few trading days of January and it is also closely associated to the tax loss selling induced by negative returns over the previous years.Where as, the week end effect refers to tendency of stocks to exhibit relatively large returns on Fridays compared to those of Monday. Monday effect is associated to the regularities in trading pattern of individual and institutional investor related to the day of the week. We find a relative increase in trading activity by individual on Mondays, in addition, there is no tendency to increase the number of sell transactions relative to buy transaction, which might be due to information which individuals collects over the week end.Last it’s the month end effect, that existence of positive returns only in the first half of the month, and more specifically where the last day of one month and the first three of the next month are particularly high. Which primarily due to higher month end cash flows such as salaries, dividend and interest payments.The journal acknowledge that there might exit short term anomalies but that these will in a longer perspective be cancelled out so that the market can go back to being perfect efficient. There is no gurantee that markets will be perfect efficient in the short run however as an investor specialized Technical analysis uses from past patterns of price and the volume of trading as the basis for predicting future prices in detecting anomalies and arbitrage opportunities will not be able to attract any abnormal returns due to irregular nature of these anomalies. The random-walk evidence suggests that prices of securities are affected by news. Favorable news will push up the price and vice versa. It is therefore appropriate to question the value of technical analysis as a means of choosing security investments. Fundamentals analysis involves using market information to determine the essential value of securities in order to identify those securities that are undervalued. However semi strong form market efficiency suggests that fundamentals analysis cannot be used to outperform the market. In an efficient market, equity research and valuation would be a costly task that provided no benefits. The odds of finding an undervalued stock should be random (50/50). Most of the time, the benefits from information collection and equity research would not cover the costs of doing the research.For optimal investment strategies, investors are suggested should follow a passive investment strategy, which makes no attempt to beat the market. Investors should not select securities randomly according to their risk aversion or the tax positions. This dose not means that there is no portfolio management. In an efficient market, it would be superior strategy to have a randomly diversifying across securities, carrying little or no information cost and minimal execution costs in order to optimize the returns. The basic question related to market anomalies in whether an identified anomaly is evidence of a stable and long run phenomenon which an investment strategy could be based on or if it is just as the names suggest a short term unique miss pricing.

  12. Rose Offner says:

    Hi Jonathan, great work and writing, I appreciate your evaluations and the comparisons which are very helpful. I had a different experience testing out the dustballs site. I typed a paper I was drafting since it was on my desktop and it foudn two examples of plagarisim, cited, both were the citations I used. both sources on google and brought them right up. on had only a three or four words from a power point on metacognition. It was really great. What I liked was I had made an error and listed the wrong authror and there it was it clicked the link and there were my sources.I will try out copyscape. I want a site that students could use to become more conscious. THanks for the great work, I look foward to reading more.

  13. Geoffrey Blackberry says:

    I guess using plagiarism detection software means trusting a service, because you should be very careful about giving out your work to someone else. I am a professor of English at the St. Michael's College and can share my experience of using an online plagiarism detection service. It is called <a href="http://www.plagiarismdetection.org” target=”_blank”>www.plagiarismdetection.org. I am using it for over 10 months. I have tried them in many ways. For example, I have scanned one document in Nov., let's say. Than I forget about it for a couple of months and scan that same document in March. It does not find any relativity to other documents, so I can be 100% sure these guys are not keeping the databases. Everybody heard of scandals with turnitin and I don't want my students to participate in someone else's database gathering.

  14. riz says:

    Hi John, Im a student very worried right now. I had a chapter of my thesis which is not due yet , and pasted it in into dubstball and plagarismchecker (around 4000words). Right it came up as OK. I am worried as to whether these sites will store my work i pasted onto databases. I do not want it to come up as plagarised on turnitin when i submit my work….

    I also did the same for grammarly.com but i didnt register only pasted my wok onto it. Last of all I did however upload a file with my work onto writechecker which cme up with like 6% plagarised which dont understand.

    I am worried now as to whether these sites will save my work to databases or use it? Really worried as i need to submit my thesis through turnitin.

  15. [...] The “Barely Useful” category was made up of Plagiarism Finder, Docoloc, CopyScape, Blackboard/Safe Assign, Plagiarisma, Compalitio, StrikePlagiarism and The Plagiarism Checker (Better Known as Dustball). [...]

  16. sondra burris says:

    Long white dress
    New queen to be
    Pretty bouquet of flowers
    And a lovely diamond ring
    Exciting celebration huge wedding bash
    many people come together
    In Britain they see at last

  17. Sue2506 says:

    Hi, I have a big problem! Since I have subscribed to  plagiarism checker, but i seems there is no way to cancel the subscription, but it will automatically continue the subscription, do you know how to cancel it? Thanks a lot!!!

  18. NN51 says:

    Same problem. They will not cancel it. They keep taking my money and their product doesn’t even work. I have asked them 10+ times to stop billing me. Plagiarism checker = scam!

  19. [...] that pop up and turn out not to be worth anything. They either produce low-quality results, barely offer any useful functionality or barely work at [...]

  20. [...] that pop up and turn out not to be worth anything. They either produce low-quality results, barely offer any useful functionality or barely work at [...]

Leave a Reply

STAY CONNECTED