10 Dollar Articles Plagiarism Checker

By Jonathan Bailey • Feb 10th, 2009 • Category: Articles, Products

10da-logoIt has become all the rage in recent months for programmers to build or revamp plagiarism checkers using Google and other search engines. Most of these plagiarism checkers, such as the “Dustball” checker, fail to produce adequate results.

The problem is that phrase selection is not simple task. It can be difficult for human beings to determine what phrases or sentences to search for, let alone a simple algorithm. As a result, such simplistic plagiarism checkers often times either miss a large number of results by choosing phrases that don’t work well with the search engines or produce a slew of false positives by selecting too common or too short of terms.

Thus, when I read about a new SE-based plagiarism checker, this one by SEO content writing service 10 Dollar Articles (10DA), I was skeptical at best.

Though a cursory search proved many of my original suspicions, it also showed that the plagiarism checker isn’t quite as useless as many of its brethren. Though it has its flaws and certainly isn’t as useful as its marketing might say, it does have some interesting features and potential compelling uses.

How it Works

10da-sampleThe 10DA checker works like many similar services. Users copy an article or piece of content that they want to check for plagiarism, they then choose up to three services to use and hit the submit button.

The service then selects a series of five or six snippets from the work and runs them through each of the search indexes checked against. When it’s done, it links to each of the results pages and the user can go through the results to see if there are any suspicious matches.

The site keeps track of which results pages you have already visited, turning those numbers to black, and also lets you recheck the article with a different set of search engines.

The Good

The real benefit of this system is that it is extremely simple to use and free. Since the product is in beta, anyone can paste text in and run it through the system.

The service is also stands out somewhat in that it allows users to run the search through different search engines, unlike others that focus solely on Google. With the 10DA checker, you can easily search MSN and Yahoo! as well as Bloglines and more. Though many of the choices seem superfluous, especially the multiple Google services normally covered underneath the main search (such as searching either Wikipedia or Google Knol) the addition of extra choices is an interesting one.

That being said, it isn’t the first checker to offer this service, others have been doing so for some time.

Though it is unclear how much benefit one gets from running the same article through three different engines, it is easy to see how those eager to be extremely thorough may be tempted by that feature.

The Bad

Where the 10DA checker struggles the most is in the value that it adds, or lack thereof. Where Copyscape compiles the results from the various Google queries it makes and displays them in a simple results page, 10DA requires users to click through to each individual results page and do the actual legwork themselves. At this time, the 10DA checker does not even provide indication of the number of matches in the specific results pages.

Due to this, the results that one gets from the 10DA checker could be easily replicated by going to the individual search engines and doing the searches for yourself. The 10DA checker does not even automatically select to view similar matches, meaning that the initial display only includes one or two copies of the work in question.

With no match highlighting, organization or other input from the system, essentially it is the same as performing 5-18 individual searches at once. Since only one search is usually all that is necessary to prove that a work is plagiarized, one has to wonder how useful this really is.

My Tests

As with most plagiarism checkers I review, I ran the site through a short series of tests to see how the results compared with stock Google searches. Since the system still primarily uses Google, this would be a true “apples to apples” comparison.

The first test involved a prose work of mine that I know has been plagiarized many times before. I ran it through the 10DA checker and the best result of the six phrases checked in Google was 26 matches. However, after tinkering with the search term, namely by shortening it and removing punctuation, I was able to improve it to 31 results.

10DA Results:10da-results
Tweaked Results:my-results

The reason for this is that the phrases the 10DA checker chooses seem, to me, to be extremely long. Where I can usually find a good statistically improbable phrase between 7-9 words long, all of the phrases chosen by the 10DA checker were over a dozen words, some even grow as long as 19.

Though the longer strings do reduce false positives, choosing a good unique phrase is more important in that regard. This is something that the 10DA checker struggled with as some of the results had only one match, indicating that the phrase selected was of poor quality.

I also quickly tested the checker with a poem that I knew to be heavily plagiarized. However, many of the matches, due to an issue with apostrophes, came back as false negatives. Of those that did, the highest had 25 results but, once again, by tweaking the search term, I was able to increase that number 28. However, using my own phrase, I was able to find several hundred results.

10DA Results:10da-results12
Tweaked Results:10da-myresults2
My Phrase Results:10da-myresults3

(Note: The high number of results from my phrase are likely due in large part to matches on the same domain. However, in a cursory check of the first few pages of results, I did see at least some positive matches that were not in the first two.)

The end result is that most people will find it pretty trivial to get better results than the 10DA checker. If they can look at the phrase selected, remove punctuation and pull out a good section of unique content, they can increase the effectiveness of the search.

However, why one would do that is a bit of a mystery. If you’re going through all of these motions and need the added matches that come from a better phrase, you’re probably going to find it faster and easier just to pull the phrase yourself directly from the content and then perform your own search.

Conclusions

Even though the site’s marketing material says that it is both a competitor and a compliment to Copyscape, Copyscape is by far a more useful service. Though 10DA seems to be about on par with the number of matches Copyscape catches, the usability of Copyscape is much higher and well worth the five cents per search in most cases.

Still, if you’re looking to do a quick plagiarism check of an article before you post it on your site, something my wife has to do as her company’s blog editor, it might be a useful service. If you don’t feel like setting up a Copyscape account or don’t mind the extra step of visiting the results, then it could be useful.

However, I can not recommend this service for checking for duplicate content of your site’s material. You can get more accurate matches by hand and the amount of energy that is saved by using the 10DA checker is pretty minimal. Even the free version of Copyscape provides good matching and a much higher usability.

But even that seems somewhat defeatist. With Fairshare bringing professional-grade matching technology and automatic updates to bloggers, there is no reason that bloggers or other RSS providers should be punching in their articles by hand to check for plagiarism.

Static content may have different needs, but with Google Alerts and CopyAlerts, there is little reason to manually check those results either.

In short, the age of copying and pasting textual content to see where it has appeared on the Web is fast ending. That is good news though as the easier to use and more automated the systems become, the more likely bloggers and other writers are to use them.

Hopefully, similar systems for images, audio and video are also fast coming.

Short URL to this Post: http://copybyte.com/z/9s

Jonathan Bailey is The Webmaster and author of Plagiarism Today, which he founded in 2005 as a way to help Webmasters going through content theft problems get accurate information and stay up to date on the rapidly-changing field. He is also a consultant to Webmasters and companies to help them devise practical content protection strategies and develop good copyright policies.
Email this author | All posts by Jonathan Bailey

  • We got tired of using Copyscape and made our own tool using AJAX and Google API.
    Thanks for doing some real research on the two checkers though. It's a simple algorithm to check for duplication but the challenge comes in making uncommon situations in submitted text bring back good results. Especially papers with quotes in them.

    http://searchenginereports.net/articlecheck.aspx
  • Thank you for your candid review of the software. The apostrophe thing is a real drawback, and I have a fix in place on my local version, I just haven't integrated it into the pretty-yet-limited version of the software that is available publicly. Even being the creator of this, I do not really disagree with much of anything stated here, for the most part. Just about everything mentioned will be adjusted or fixed, save the grabbing of matching results through an API or other method and displaying them on-site.

    While I think to appreciate what the software does, you would have to use it more than just on a handful of queries -- the market for this type of software is smaller than for the userbase of a site such as copyscape.com. It's not about saving five cents, but I know plenty of people who can't afford the $10/$20 to purchase copyscape credits -- the primary market of this software is content buyers who need to be as sure as they can that their purchased content was not lifted from some obscure, and possibly overlooked, source.

    In any case -- I fully appreciate your candid review.
  • Thank you very much for your response and please keep me posted as you add new features and fix known issues, I'll be very eager to hear.

    The question that I guess I have is that I know many people who use Copyscape to check an article that they purchased for plagiarism (something you can easily do with the paid version) and I'm curious what you feel the benefit of your service is over that one. Clearly you save the five dollars for the first check, so would yousay that yours is for the "occasional" plagiarism checker and not the one that does it routinely?

    I would agree with that statement, but I am unsure exactly how large a group that is. I'd love to hear any thoughts you have.
  • I guess to better answer your question, I'll tell you why I wrote the script. I hire writers for a variety of things, and in some cases, that content is passed onto 3rd parties through a service (such as through 10dollararticles.com). I wanted to be sure that I was covering every possible angle when trying to detect potential content theft. This script was initially a tool for me to use with Copyscape -- I just happened to open it up to the public.

    Now, I am not fully aware of copyscape's resources that it uses, so until I can exhaustively compare results found with my software against the results that copyscape produces, I can't say with any assurance that my software will produce results that copyscape will not. However, what I can say is that, based on somewhat limited testing, the "niche" searches tend to give better results than general searches (through the software) in certain situations. Which, this translates to better results for those who simply copy and paste a string into Google's main search interface to check for duplicate content.

    Say a writer finds a .doc file on the subject of gardening which is indexing in Google/Yahoo and uses that content. In my testing, the .doc documents we're not always showing up at the top of the search results when you performed an exact-match search for a string in that document -- web pages that do not have exact-matching phrases will outrank the .doc files. Maybe this is a flaw with Google's/Yahoo's ranking algorithms, but I replicated it a few times with different documents, so it's something that caused me to add these options (doc/pdf searches) to the software.

    My willingness to update this software beyond bug fixes and a couple of small improvements really depends on whether more people start to use it. I am working on a way to match content contextually (rather than exact-matches) to help identify plagiarism, but something such as that wouldn't be added unless there was a stronger demand for the base software. And even then, I'm not sure it would be a good fit because you cannot expect a general user to understand contextual matches, or even false positives for that matter -- this is another inherent flaw in my software.

    I guess, at the end of the day, I can simplify things for you -- possibly. This is software that I was using personally, and put it on that particular website for two reasons -- targeted link bait, primarily; though I do enjoy writing programs for others to use. Even being up on the site for maybe one month, it's receiving visitors and it appears to have a few regular users. I don't really know if there's a huge market for it, but I found/find it useful, and I have to assume that there are others who will as well. I have no current intentions to develop that into a paid service -- so it really is just there to help others who find it useful, and also to improve the visibility of that particular website.

    Once I make some substantial changes to the software, I'll be sure to make a new comment on this page.
  • Chip
    "Since only one search is usually all that is necessary to prove that a work is plagiarized, one has to wonder how useful this really is."

    Agreed. I tried the tool out last night and the apostrophe trouble is a real pain. So is not knowing whether a search has a positive result or not. In it's present state I don't see a reason to use it over inputing your own phrases in Google.
  • Me either truth be told, the only advantage I can see is for Copyscape users that don't mind the limitation in the ease of use but don't want to pay the five cents for the text search. Needless to say, that's a very small target audience...
blog comments powered by Disqus