According to a recent article in Time Magazine, Sir Brian Vickers, a literature professor at the University of London, has proved that the play The Reign of Edward III was a collaborative work between William Shakespeare and Thomas Kyd, another famous playwright from the era.
The proof was that, by comparing the play to Shakespeare’s known library of works, he was able to use the plagiarism detection software to track phrases that in Edward III that were also in his other works. This, combined with his intuition and decades of research in Shakespeare’s writing, led him to conclude that the work was a collaboration and that Shakespeare had written four scenes total, three in the beginning and one later in the work.
According to Vickers, he found some 200 matching strings in the play, where 20 is the average. While the evidence seems damming and Vickers says this proves the collaboration “beyond a doubt”, I am not so quick to jump. Though I haven’t conducted Vickers’ test, though I may after Halloween season as the application he used is freely available, I have significant doubts that this is the best way to resolve this debate.
Though I agree with Vickers’ conclusion, I base it more on his expertise and knowledge than I do his numbers, not because the numbers aren’t sound, but because they come from a flawed tool.
The Problem with Automated Analysis
Plagiarism detection systems are designed to answer a simple question: “Did person A use work from someone else?” To do this, It checks for matches against sections of works or matches against fingerprints of a work and reports on what it finds. It can’t make judgements about proper citation, coincidence, or malice. Those judgements, along with the final call of “Plagiarism” have to be made by a human and there is a lot of gray area where one person, even an expert, can see something totally different than another of equal calibre.
But even with this simple task of matching text, plagiarism detection programs aren’t perfect and they are easily defeated. Rewriting a work, though requiring more effort than is likely worthwhile, can thwart automated detection schemes. Likewise, in my testing as well as others, actually miss a great deal, even when the amount of plagiarism is known.
However, what Vickers did was to use a plagiarism detection app to do something quite different and clever, to look for strings to see if the same person wrote two or more works. The idea seems simple enough, writing styles are like fingerprints and most people who write a great deal tend to use the same expressions and sayings over and over again. Still, it is unclear that plagiarism detection software, meant to look for verbatim plagiarism, can be useful in this capacity.
Simply put, there are many ways to generate false positives. Many writers, for example, are very good at mimicking the style of another. In fact, many ghost writers do this regularly. Also, there are other ways one can inject their language style into a work, including editing it or adding to the work of another. This prevents us from drawing any serious conclusions based upon the automated analysis alone as there are many explanations that don’t involve Shakespeare writing the scenes in question.
In short, plagiarism detection software is most useful when we have two known sources (e.g. Person A turned in this paper, Person B turned in this one, etc.). Working backwards like this can only tell you that the scenes in Edward III have similar phrasing and word choice to other Shakespeare works. However, this is something scholars have known for a very long time and is exactly why the work was a suspected Shakespeare piece for some time.
In the end, I actually believe that Vickers is right. Not only does his explanation for the origin of the work make sense, but his knowledge and understanding of Shakespeare and other authors at the time is beyond reproach. Still, until we have some definitive proof that the work was authored by Shakespeare, there will always be doubt and there should always be.
Despite that, the technology is imperfect even when used in its intended application and is completely untested for this particular use. Though it can definitely be useful for this kind of research, it is no substitute for human knowledge and human reading. There is no magic button to tell you if a work is plagiarized, much less if it was written by the same person. The human factor still has to come into play.
However, the technology to detect plagiarism has grown by leaps and bounds in the last ten years alone and we may soon reach a point where machines can effectively do more of the heavy lifting. We’re not there yet, but it isn’t too hard to envision it as an endgame sometime in the near future.
After all, 15 years ago the idea of using a computer to search for matching text in nearly every Website, research articles and millions of other student papers was outrageous. However, that is exactly what thousands of high schools and colleges do every day.