Are AI Writing Detectors Getting Better?

The summer of 2023 was not a good time for AI detectors.
OpenAI famously recalled its AI detector, saying it wasn’t effective. A study by the Washington Post found serious gaps in the efficacy of AI detection systems, and a similar study by academic researchers found similar issues.
Things weren’t looking significantly better in the summer of 2024. Then, a pair of studies found that AI detection was not working consistently and, to make matters worse, humans were not much better than the bots.
However, in more recent months, the headlines have been getting somewhat brighter. A recent article by David Gewirtz at ZDNet found that three AI detectors were correct in every test he performed. An earlier examination by Chandra Steele at PC Mag had similar results.
This is a sentiment that Anna Mills, a long-time critic of AI detectors, recently echoed on her LinkedIn. After reviewing more recent literature, she’s decided that such detectors have a place in the process.
So, are AI detection tools getting better? If so, how much can we rely on them? The answers are complicated.
First, the Good News
The good news is relatively straightforward: the detection of AI writing is improving.
How much of an improvement is an issue of debate. However, there’s not much doubt that it is.
But this should be an expected outcome. AI detection tools are constantly being honed and trained, and there is no shortage of human and AI-generated content to train these systems.
Though there are debates about whether and when generative AI will plateau, AI detectors may not have those issues.
For example, one of the struggles for generative AI systems is finding new content to train on. However, AI detectors have a nearly unlimited supply of AI-generated works on which they can train.
So, it does seem that AI detectors are genuinely improving. However, that may not be enough for many to breathe easier.
The Bad News
The bad news is multifaceted.
First, there’s a great deal of disparity between the various systems. Looking at Gewirtz’s article, three tools successfully identified all five papers. However, three different ones only correctly identified two.
While decent systems exist, there’s also a lot of garbage. Teachers and others seeking AI detection tools need to be aware of this and keep up with changes over time. This is a rapidly evolving space.
Second, even the best systems will make mistakes, including false positives. Given that these systems are black boxes, and humans can’t verify or refute the findings, relying solely on them is not wise.
Finally, this is a game of cat and mouse. Various tools aim to “rewrite” or “humanize” AI-generated text. Likewise, as we saw previously, human editing of AI text can still be challenging to detect.
In many ways, it’s 2005 again. Back then, plagiarism detection in schools was still relatively new. Students played a lengthy game of finding ways to bypass or fool the systems. As plagiarism detection tools discovered new bypasses, they patched their systems. However, that may not be as practical with generative AI detection due to the systems’ “black box” nature.
In short, while the progress of AI detectors is impressive, there are also reasons to be skeptical.
Layering Swiss Cheese
So, where does this leave AI detection? That’s a difficult question, especially since everything can change tomorrow. This is a space in flux.
However, there are three simple facts that, most likely, won’t change and can’t be ignored.
- AI Detectors Will Never Be Perfect
- They Will Always Be Black Boxes That Humans Can’t Verify
- Bad Results, in Particular False Positives, Are Always a Risk
In her post, Mills recognized this. She cited the “swiss cheese” approach. Mills cited Philip Dawson as her inspiration. Dawson, in turn, credits Kiata Rundle (and her PhD advisors Guy Curtis and Joseph Clare) with the idea.
The idea is simple. Any one layer or approach will have holes, including AI detectors. You should take multiple approaches rather than relying on a single layer. This includes a variety of options, including crafting AI-resistant assignments, having more work done in a controlled environment (such as the classroom) and asking students questions after an assignment is turned in.
Every layer you can add prevents and detects AI plagiarism while reducing false positives and minimizing the harm to non-cheating students.
As such, the question isn’t “Can teachers rely on AI detectors?” Instead, it should be “Should AI detectors be one of those layers?”
To me, that seems reasonable. But only if teachers understand the limitations.
Bottom Line
Those who are against using AI detection software have a simple fear. They worry that teachers will make decisions based solely on the detector and then punish students who were simply the subject of false positives.
To be clear, this fear is not unfounded. Misuse is rife even with traditional plagiarism detection software. Even when teachers can quickly examine the results and make a human determination, many don’t. There have been too many stories of teachers and editors relying blindly on matching percentages and other indicators to define plagiarism.
It’s easy to assume that teachers will be even more inclined to misuse a system that requires more human investigation. It doesn’t help that we’re seeing headlines such as this one, where schools face pushback, both in and out of the courts, over dubious AI allegations.
Teachers are, by and large, overworked and underpaid already. They struggle with everything that is asked of them and now have these AI issues dumped on them as well. It is grossly unfair. I cannot fault teachers for struggling with these issues.
That said, if the choice is between using AI detectors solely or not at all, I would prefer not at all. Even though they are improving, that can change in a heartbeat, and the risks are too great.
However, they definitely have a place as part of a nuanced and layered strategy to reduce the unpermitted use of AI. The challenge is whether there is a will and the ability to implement such a strategy.
Much like traditional plagiarism detection, AI detection will never be “set and forget.” Using such tools improperly can and will harm students. However, nuance is difficult and, for some, it may be impractical.
That is a simple reality in many classrooms across the world.
Want to Reuse or Republish this Content?
If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.