Back in June, behavior scientist Jean-François Bonnefon tweeted about a rejection he had received from an unnamed scientific journal. What made the rejection interesting was that it didn’t come from a human being, but from a bot.
An automated plagiarism-detection tool had determined his paper had a “high level of textual overlap with previous literature” and summarily un-submitted it. However, according to Bonnefon, the sections flagged were elements such as their affiliations and methods, which are logically going to be very similar to previous works.
The story has since gained a decent amount of attention from the scientific community, many of whom have shared similar experiences.
However, the story is just a small peek into a much more broad problem. Automated plagiarism detection tools have been around for decades and over reliance on them has been a problem for nearly as long.
Increasingly, the use of bots in peer review is moving past spelling, grammar and plagiarism checking. As the number of journals and submissions continues to rise, publishers are seeking more and more help from technology.
But that help may bring its own set of problems. Publishers would do well to learn the lessons of YouTube and other bot-heavy enforcement systems. Because, as we’ve seen time and time again, the introduction of bots often causes as many problems as they fix.
Let the Bots Rise
The academic publishing industry has been undergoing some insane growth over the past decade. According to one estimate at University World News, there is approximately 30,000 journals publishing some 2 million articles per year right now. This says nothing of the number of articles submitted. This comes as the global scientific output is estimated to be doubling every nine years.
This rapid growth has put a huge burden on editors and peer reviewers alike. With only 20% of scientists performing peer reviews, this explosive growth is being shouldered by a relatively small group of researchers, many of whom have extremely short careers.
With those workloads increasing, publishers have been regularly turning to technology to help make the peer review process faster and more effective. Some key examples include:
- The peer review platform Scholar One teaming with UNSILO to produce an AI that can interpret natural language, extract the key findings and compare those findings to other papers in its database.
- Statcheck, an automated tool that can analyze the statistics in paper and spot any abnormalities.
- ScienceIE, a competition for teams to create algorithms that could extract the basic facts and ideas from scientific papers.
- Publisher Elsevier has developed EVISE, an AI system that, in addition to checking a work for plagiarism, suggests potential peer reviewers and handles correspondence.
- Artificial Intelligence Review Assistant or AIRA, which combines other available tools to automatically detect plagiarism, ethical concerns and other potential issues in a paper so they can be flagged for further review. It also suggests peer reviewers.
These are just some of the systems that have either been developed or are in development that have the goal of automating pieces of the peer review process.
While none of these tools aim to replace human peer reviewers and, instead, seek to help them, they still all sit as a front line between the person submitting the paper and the peer reviewer. In all cases, a paper that runs afoul of these bots will find an uphill climb to getting published.
This creates a series of uncomfortable questions. What if the bots are wrong? How will researchers respond? And how will it impact science?
Obviously we can’t see the future, but there are a few interesting analogs in our recent past.
The YouTube Case Study
Comparing YouTube to academic publishing might seem to be one of the most bizarre comparisons imaginable. However, they have two very important things in common: They both have a problem of getting far more submissions than can be easily vetted and, in both cases, their submitters are facing a “publish or perish” environment.
Also, in both cases, they’ve turned to increased reliance on bots and automated moderation to take as much of the load off of them that they can.
For YouTube, this process has been anything but smooth. Though their tools have done an admirable job at handling the vast majority of cases, they’ve proven woefully inadequate at handling the outliers.
Even just looking at their copyright problems, YouTube has become well-known for as a place where videos are automatically (or semi-automatically) claimed or removed without cause. Some of this is just poor implementation of Content ID, but much of this is due to poor data and poor matching.
But, more to the point, these bots have changed YouTube culture. YouTubers often spend as much time trying to avoid Content ID or other YouTube bots as they do making their videos. Those who make a living on YouTube are constantly in a fight to avoid demonetization and copyright claims. That has an impact on what they create and how.
Though YouTubers are well known for loudly protesting YouTube’s policies, they have also been very quick to adapt. As YouTube policies and practices have changed, YouTubers have quickly adapted to ensure that their latest uploads aren’t caught in the filters.
It’s an ongoing cycle of YouTube making a change, either in policy, matching or content matched, and YouTubers moving to get around it.
With YouTube, this cycle is very quick because YouTubers typically know what the bots said about their videos almost instantly. With academic publishing that iteration period will be much longer because of the delay in getting the feedback. Still, there’s no reason to assume the same cycle won’t repeat.
However, we already knew that. We’ve seen it already with another issue of academic publishing.
Even without the aid of advanced bots, editors and peer reviewers have sought out ways to streamline the process and make it easier to determine what research is valuable and what is not worthy of publication.
In recent years, one of the tools that has become the most used and most controversial is the p value. To extremely oversimplify it, a p value (or p-value) is a measure of probability. It basically measures the likelihood that you would get the dataset you found in a study if the null hypothesis were true. As such, a lower p value is considered more statistically valid and only papers with a p value of less than .05 are considered “publishable”.
To be clear, the use of p value in this manner is very controversial. However, that hasn’t stopped many publications from us p value as a gatekeeping tool.
Researchers have responded to this by what is known as p value hacking or p value manipulation. This can be done many different ways including excluding data that raises the p value, limit findings to that which has an appropriately low p value and so forth.
You can play with p value hacking yourself using this tool at FiveThirtyEight. It lets you compare economic performance of the United States with the major political parties and, by choosing what to include or exclude, you can easily produce “publishable” results that show both that parties help and harm the economy.
In short, you can make the data say whatever you want and obtain a p value that’s low enough to be considered valid.
P value operates much like a bot or any other algorithm in that it parses complex data into a human-understandable value that help editors and peer reviewers make a judgment about the paper. It’s just a matter of how useful that value truly is.
However, unlike bots whose algorithms and processes are kept secret, everyone knows how to calculate a p value and it is easy to keep working on your data until you get an appropriate number. While it’s unclear how common p value hacking is (though one 2015 study reported it to be “widespread”) and there are ways to minimize and protect against it, it was a natural and predictable response to the rise in the importance of the p value.
As bots begin to take on more and more of the load of peer reviewing and editing academic journals, we’ll likely see a similar response to them as researchers, eager to get published, will try to stay one step ahead of the bots.
Researchers, ultimately, are human beings and they will respond to any changes by attempting to adapt and overcome them. Though scientific publishing may not become quite the bot-laden dystopia that YouTube has become, it will face many of the same challenges.
But those challenges aren’t purely technical, they’re also human. For all of the obstacles AI faces in parsing research papers, it will also have to deal with researches changing and adapting to the bots. It’s a game of cat and mouse we’re already familiar with on other platforms but may not be an even bigger part of the future of scientific publishing.
One way that publishers can prevent the worst outcomes is to keep humans first and ensure that the bots play only an advisory role. Though humans are certainly flawed in their own ways and are often equally exploitable, the combination humans and bots make those exploits much more difficult as they can bolster each others weaknesses.
Still, until we remove the pressures that force researchers to “publish or perish” it’s important to note that any AI that is applied is not only going to impact the kind of research that gets published, but will also become a target for researchers to bypass.
It’s crucial to think about and plan for this before implementing these systems as waiting until after they’re in use may be too late. If we’re going to play cat and mouse, we need to make sure the mouse at least has a head start.