Update: Just as this story was about to go live, I received word that Timothy Parker had stepped down, at least temporarily, from both USA Today and Universal. More Details as they come.
When one thinks of plagiarism in newspapers, one usually thinks of errant journalists copying and pasting the works of others writers, not whether the crossword puzzle contains plagiarized or duplicative material.
However, that’s exactly the question facing crossword puzzle fans as a recent report by Oliver Roeder for the FiveThirtyEight blogzcvbtfctyxxeaybzsdbcutxftuv highlights a growing plagiarism scandal in the industry.
At the heart of the scandal sits Timothy Parker, who is both the editor of the USA Today Crossword and editor of the Universal Crossword, a heavily-syndicated crossword feature. On February 25th, Ben Tausig, the editor of the American Values Club crossword, tweeted that a crossword he wrote for USA Today in 2004 was tweaked and run again in 2008 and then again in 2015.
He was able to learn about this due to a database of over 50,000 crossword puzzles compiled by software engineer Saul Pwanson, who also wrote the software for comparing the puzzles to one another to determine originality. What he found, when he looked at all of the puzzles published since 2003, was that 5.4% of Universal puzzles and 16.0% of USA Today puzzles had greater than 25% similarity to previous puzzles.
That percentage is in sharp contrast to other publishers, such as The New York Times and the Wall Street Journal, both of which had less than 0.5% with such a high overlap.
Parker, for his part, has denied any wrongdoing. Though he admits to posting under pseudonyms and that there is similarity between the themes of his crosswords, he said the similarities are simply caused by coincidence, not malicious plagiarism.
But despite the denial, the story has caused shockwaves through the crossword community and raised some seriously difficult questions about originality in crossword puzzles. Questions that will not be easy to answer, especially since the Pandora’s box of automated detection has been opened.
Understanding Crossword Puzzles (and Crossword Plagiarism)
To better understand the nature of the scandal, we first have to understand that there are two types of clues and answers in most crossword puzzles:
- The Theme: The theme are the core answers to the puzzle. They are typically the longest, most creative and most important answers in the puzzle They are based on the topic of the puzzle and are what gives the puzzle its identity.
- The Fill: The fill are other other answers that finish off the puzzle. They are not related to the topic or title of the puzzle and are often created with computer assistance.
To that end, the scandal around Parker centers around two different scenarios, each with unique implications. To keep the reporting consistent, I’ll use the same terms and definitions that Roeder did in his original report:
- Shady: These are puzzles where Parker is alleged to have reused themes and theme answers identical to puzzles found elsewhere, most commonly The New York Times. Though less of the actual puzzle is duplicative, since it is the theme answers at issue, it is still seen as questionable. There are 65 of these puzzles at issue.
- Shoddy: These are puzzles where Parker is alleged to have repeated puzzles in Universal or USA Today that were published earlier. These can be thought of as self-plagiarism as Parker is repeating his previous puzzles (or previous puzzles he licensed) but presenting it as something new. They have much higher matching percentages, but are a different debate both ethically and legally. There are hundreds of these types of puzzles.
The crossword community is mostly focused on the “shady” cases because, in those, Parker is accused of lifting themes and theme answers from The New York Times, widely considered the standards-bearer in crossword puzzles. However, both types of cases seem to paint a picture a crossword editor that routinely cuts corners and takes shortcuts to produce the product.
To that end, Parker has been defending himself, even if the data doesn’t seem to back up his claims.
Statistics and Crossword Plagiarism
For Parker, the best explanation for these similarities are pure coincidence. He even went as far as to say that, given the number of puzzles he’s published, he would expect hundreds with such similarities.
And to be clear, there have been situations where themes in puzzles have been copied. In 2012 puzzle creator Matt Gaffney wrote an article for Slate detailing how an Edgar Allan Poe-themed puzzle he had created had a nearly identical theme answers to a previous puzzle by Mike Shenk.
The similarities were just a striking coincidence caused by similarities between the topics and the limitations of crossword construction. Gaffney legitimately claimed to be unaware of the other puzzle.
There have been plenty of other striking tales of accidental duplication, including the bizarre case of Dennis the Menace, where an amazing coincidence led to a seemingly impossible act of accidental duplication.
But while Parker may claim that this type of coincidence is the cause for this duplication, the statistics show otherwise.
According to Pwanson’s database, the New York Times has published over 5,000 puzzles since 2003. Of those, only 0.1% have had more than 25% similarity with previous works. At USA Today under Parker’s editorship, 16.0% of the nearly 4,000 puzzles have been more than 25% similar. That’s a 160x increase in the probability a puzzle is unoriginal.
If pure coincidence and accident were to blame, we would likely see similar rates of duplication at other crossword puzzle creators. However, the highest rate at any other provider is Newsday, which was 1.1%, still less than 1/12 that of USA Today.
So, while it is true that there are limitations inherent in crossword design that increase the likelihood of accidental duplication, the statistics show that Parker’s publications are outliers in the industry, making it far less likely that it was accidental in nature.
Opening Pandora’s Box
But while the story about Parker and his alleged misdeeds is huge, there’s a much, much larger picture here and it centers around what Pwanson has created.
Up until that database, there was no way for people to quickly and easily check to see how original crossword puzzles were. If plagiarism in crossword puzzles were to be detected, it would have to be done by eagle-eyed and readers or authors. While that certainly happened from time to time, as with Gaffney’s story, it clearly only caught a small percentage of the duplication.
In a strange way, crossword authors are where reporters and writers were in the 90s as the first large-scale plagiarism detection tools were being created. Prior to the development of document fingerprinting, there was just no way to check a text work against a large database of other works (such as the Internet).
That technological leap has unveiled plagiarisms new and old. Plagiarisms thought to be buried deep in the past have been brought to light. We’ve seen this a lot with German, Romanian and Russian politicians, many of whom have had decades-old dissertations challenged over plagiarism allegations.
Now something similar is starting to happen in the field of crossword puzzles and Parker is its first victim. However, it’s almost certain that there will be others.
As the technology improves and the database grows, it will do the same thing that text and image plagiarism detection have done in their fields, expose duplications and put a blight on the records of many who felt they were safe.
Though Parker claims that this is coincidence or par for the course in crossword puzzles, the statistics paint a different picture. While it’s almost certain that the nature of crossword puzzles makes coincidental duplication more likely, the stats show clearly that it’s not a common phenomenon and that Parker’s work is an extreme outlier.
As I was writing this Will Shortz, the New York Times crossword editor since 1993, the person who was the victim of the duplication, commented on the story saying, “When the same theme answers appear in the same order from one publication to the next, that makes you look closer. When they appear with the same clues, that looks suspicious. And when it happens repeatedly, then you know it’s plagiarism.”
I tend to agree.
But now the crossword community is faced with a different task, determining what should happen to Parker. In short, “What is the punishment for crossword plagiarism?”
While this isn’t the first time that the crossword community has faced plagiarism allegations, it’s definitely the first time with such serious allegations being levied against such a key figure in the industry. It’s also never faced such a public and mainstream scandal either.
How the community and Parker’s employers react is going to set the tone for the entire industry and that tone will impact how others view it. A community that claims to cherish originality but fails to tackle plagiarism holds little credence.
Hopefully, whatever happens to Parker, the outcome will be fair, transparent and swift. Fortunately, those investigating it will have some new tools to streamline the process and make things go a little more quickly.