Textbroker is a marketplace for custom-written articles. Though such sites are often derided for, allegedly, turning a blind eye to plagiarism, Textbroker seems to be doing something about it. The company has made available a freeware application named UN.CO.VER , which stands for UNique COntent VERifier that it claims can both search for plagiarism within a work and seek infringing copies of the content.
The application is a Java app and can run on Windows, Mac or Linux and is only a 2MB download. It claims to be able to check content that is pasted in, a single URL or even an entire domain in one swoop.
At the encouragement of a a good friend of mine named Angela Swanlund, who was on Textbroker’s mailing list and was offered to use UN.CO.VER, I decided to put the program through its paces and see if it lived up to its description.
The answer surprised even me.
How it Works
UN.CO.VER (UCV from here on) is a fairly straightforward plagiarism checker.
Out of the box you are presented with three options:
- Check Text: Lets you copy and paste in text for checking.
- Check Domain: Poorly named, this feature lets you check a single URL. Can set up beginning and ending strings to avoid checking unwanted text such as comments.
- Check Websites: This feature will attempt to crawl and perform a plagiarism check of an entire Web site, including up to 2 levels deep though more is possible through manual selection.
In all three options, the process is pretty much the same. You set up the scan that you want and click “Check Now”. UCV will then go through the text and find any suspected duplicate content and report it in the space below. The process seems to take just a few seconds per 1000 words and, even in repeated testing, didn’t crash or create any errors.
The question, however, is how well the UCV works. To find the answer, I decided to put it through its paces, testing each option individually.
Where UN.CO.VER Gets Its Results
Before I began the tests, I noticed that the UCV site and manual were unusually tight-lipped about where they were getting their results from, something that made me instantly suspicious. So before doing any reviewing, I decided to find out.
Using a tool called Fiddler2 Web Debugger, I routed all of UCV’s traffic through Fiddler2’s proxy and listened to the Internet traffic.
What I found was that UCV was using Yahoo Search API to find its results. Certainly not a bad way to do it and no reason to be secretive about it, but it is also a system available to everyone.
I also noticed that UCV was looking at the URLs provided by Yahoo and doing some kind of additional analysis as most of the URLs listed in Fiddler2 as being visited by UCV were not appearing in the results. This indicates that UCV may be more than a “dumb” plagiarism checker repeating results from Yahoo.
With that out of the way, I decided to do a few tests of UCV to see how well it performed.
Check Text Feature
To start testing the “Check Text” feature, I began with an article that had a known amount of reuse, namely last week’s column about schools are hurting the fight against plagiarismtybcearxxbdrvxcttasrdecqrwuebs.
Since UCV isn’t aware I run Plagiarism Today, it should report PT as a complete duplicate and, assuming no other reuse exists, nothing else..
Indeed, that is exactly what happened. UCV reported the original URL on Plagiarism Today and also listed a other URLs that had an extremely small amount of matching content, all of which were false positives triggered by matches for strings such as “to cite sources”.
Next, I tried an old poem of mine that I knew to be widely copied and reused. UCV churned on the work and found seven copies of it. However, two of those copies were on my site, leaving only five potential plagiarisms. However, a quick search on Google easily found a dozen copies of the poem on sites other than mine, some legitimate, some not.
One thing that was impressive was that one of the results UCV turned up was a modified version of the poem that was only about 65% the same. Still, the results overall were incomplete.
Finally, in an attempt to see how it would handle “clean” text, I ran the draft of this article through the service. Surprisingly, UCV found some 24 potential matches though all were less than 5% of the unfinished article and were, once again, for very short strings such as “I decided to put”.
Toying with the sensitivity settings alleviated this problem some, but I wasn’t able to find a good balance between few false positives and good match detection.
Check Domain Feature
Since I already had some idea as to UCV’s matching ability, I decided to simply test this feature on a short story of mine that I knew had a moderate amount of reuse.
I set up the system, being careful to include the beginning and ending phrases to avoid any extra content being searched, and let it do its thing.
Unfortunately, UCV came back with no useful matches, just a handful of false positives. However, a quick search on Google found at least three copies of the work in addition to the one on my site. Though it was clear UCV had parsed the text correctly, it simply did not find the critical matches.
Needless to say, this was very disappointing but it seems to be an issue with UCV’s matching, not the URL check feature as it did work correctly.
Check Websites Feature
Finally, I decided to cut UCV loose on the whole of my site to see what it could do. I told it to check my entire old literature domain and do so 2 levels deep, meaning it should follow links on the home page and links on those secondary pages.
Almost immediately, I noticed that there seemed to be a lot of pages missing that should have been included as they were less than 2 levels deep. However, I decided to simply try with what it had and watched as UCV spun.
Unfortunately, that was the first problem. Though UCV seemed fairly quick when doing each item, it felt sluggish when going through so many. Though the time per item was about the same, for some reason the delay just felt that much longer when it was back-to-back with dozens of other works.
I ended up having to stop the search about halfway through as I couldn’t do other tests while this search was ongoing and I had some other checks to make. Still, I had enough time to let it get a few dozen results.
What I found with those results was a combination of a lot of false positives and a lot of inaccurate match totals. Since there was no way to tell UCV to ignore comments, it grabbed a lot of additional, unwanted content on every page and, combined with its sensitivity, threw back almost exclusively false positives and, the few matches it did find that were interesting it listed with lower percentages than accurate.
Without the ability to filter the content indexed, there isn’t much this feature can do. Unless your site exclusively has your own work (meaning no comments and almost no navigation, footer, etc.) then this feature is going to be wildly inaccurate for the most part.
To be completely honest. UCV performed better than I expected, but only because my expectations were so breathtakingly low. Everything about this application screams “unprofessional” from the broken EULA agreement (that is also in the demo video) to the poor word choices, lack of documentation and amateurish layout. The app looks and feels like it should trip on its own shoelaces and die.
However, it actually does a reasonable job with certain kinds of plagiarism checks. Though my testing was limited, if a work was plagiarized, it did seem to detect it, with the exception of the short story. This means that UCV may be at least somewhat useful for its stated task, checking content for plagiarism, and not trying to find every copy of an original work. Simply put, UCV was just too inaccurate to do that task.
That being said, my greatest concern with UCV is its affiliation with a custom-writing site. Rightly or wrongly, these sites have become known as hotbeds for plagiarism (due in large part to their relatively low payout to writers) and emphasis on rush jobs. This can be seen as an attempt to help writers avoid accusations of plagiarism while engaging in copying that would, if detected, be considered as such.
Whether this is the goal or not is hard to say. But it does seem odd that a custom writing site would offer a plagiarism-checking tool to its writers to check their own work for accidental content misuse.
While mistakes do happen, they are far less common than intentional misuse and that alone makes me feel strange about this application.
Still, looking solely at the merits of the application, it does seem to work reasonably well for what it is designed to do. However, with so many other great tools, including CopyScape and Plagium, already available, I don’t see what the benefit of a standalone application is. Sure, it’s URL checking feature is pretty cool, but it isn’t much easier than just copying and pasting what you need and its full site check is just too slow and too inaccurate to be of much use.
In short, give it a try if you want and consider it a useful addition to your toolbox, but don’t make it your primary checker for any purpose.