Though I recognize that most of the readers of this site as Webmasters eager to protect their content, I also know that more than a few developers of plagiarism detection tools read this blog. For them, I wanted to do a quick post about about the upcoming Spanish Society for Natural Language Processing 2009 conference, which is hosting a PAN workshop on plagiarism analysis, authorship identification and “social software misuse”.
As part of this PAN workshop, Yahoo! research is hosting what it is calling the 1st International Competition on Plagiarism Detection, which it hopes to make an annual event.
The competition pits plagiarism detection systems against one another to test their accuracy and completeness.
Specifically, there are two tasks:
- External Plagiarism Analysis: This task provides contestants with suspect documents and source documents and requires the system to find the plagiarized passages.
- Intrinsic Plagiarism Analysis: This task requires contestants to detect plagiarized passages WITHOUT comparison to outside documents, for example, by detecting shifts in writing style.
The competition is providing the documents to be tested, estimated to be at 20,000 source and 20,000 suspect documents of various sizes with various amounts and kinds of plagiarism. The documents are primarily in English and the plagiarism has been “perpetrated” by a software application that randomizes the the amount plagiarized, the obfuscation and even, in some cases, translation.
The competition is open to commercial plagiarism checkers but requires that submissions be provided in a set XML format to make it easier for them to process the output (due to the large volume of plagiarism). This may mean that some services have to “hack” their output to fit the standards of the competition.
The winning product receives 500 Euros and submissions are being accepted until June 7th, 2009. Please see the link above for the specific rules.
This is not the first time a broard-array of plagiarism detection suites have been put to the test. In November of last year, Dr. Debora Weber-Wulff, a professor at the University of Applied Sciences in Berlin, announced the results of her second round of testing and gave the top prize to Copyscape.