Do AI Detection Tools Work? Does it Matter?
Why I am unsure of what to think about tools that supposedly detect AI-generated text.
[image created with DALL-E 2]
Welcome to AutomatED: the newsletter on how to teach better with tech.
Each week, I share what I have learned — and am learning — about AI and tech in the university classroom. What works, what doesn't, and why.
Let's consider whether even failsafe detectors of AI-generated text leave professors in a tough spot.
I set out to write this piece after reading about the professor from Texas who asked ChatGPT if his students had plagiarized their work with ChatGPT.
As Rolling Stone's Miles Klee reports, ChatGPT responded that it wrote all of the students’ papers and, subsequently, the professor angrily accused them of plagiarizing. But ChatGPT is not a truthteller in this respect. Here is Klee:
There’s just one problem: ChatGPT doesn’t work that way. The bot isn’t made to detect material composed by AI — or even material produced by itself — and is known to sometimes emit damaging misinformation. With very little prodding, ChatGPT will even claim to have written passages from famous novels such as Crime and Punishment. Educators can choose among a wide variety of effective AI and plagiarism detection tools to assess whether students have completed assignments themselves, including Winston AI and Content at Scale; ChatGPT is not among them.
What caught my eye was Klee's passing claim that tools like Winston AI are effective detectors of AI-generated text.
Although I planned to use this post to simply report the results of my tests of this claim, I ended up leaning towards the view that it does not matter whether it is true. Let me explain my thinking on this issue, which is still evolving.
🔎 Does Winston AI’s Detector Work?
In my investigation of the reliability of detectors of AI-generated text, I started by giving Winston AI some of the outputs of the free and publicly available ChatGPT (reliant on GPT3.5; henceforth 'ChatGPT3.5').
Winston AI accurately identified them as AI-generated with high confidence.
I got similar results when tried to prompt ChatGPT3.5 in a variety of ways — I encouraged it to write like a high schooler, change its patterns to avoid detection, write like Paul Krugman, and so on. Switching to ChatGPT4 did not affect the ability of Winston AI to identify my submissions as AI-generated, either.
I then ran these outputs through Quillbot, which is an AI paraphraser that YouTubers and TikTokers have recommended undergraduates use to avoid detection.
Winston AI still gave the right answer: all of my submissions were AI-generated.
Winston AI successfully identifying AI-generated text.
Next, I tried Bard, Google’s competitor to ChatGPT. Regardless of my prompting, Winston AI accurately identified Bard’s responses as AI-generated.
When I interspersed my own sentences in the AI-generated text, Winston AI correctly distinguished the AI-generated text from my own sentences. Its confidence that the whole chunk of text was AI-generated declined proportionate to the addition of my own sentences.
So far, Winston AI was successful in accurately identifying confirmed instances of AI-generated text.
I turned to the negative case next: I wanted to see how well Winston AI handled text that I knew was not AI-generated. Long story short, it did a good job.
I gave it my own writing from AutomatED, some of my undergraduate papers, and my academic journal articles, as well as bits of the New York Times from the early 2000s, and it successfully identified them all as human-generated. Regardless of what I gave it, it was accurate with high confidence.
⁉️ The Gap between the Facts and the Evidence
These results are hopeful. They are certainly an improvement over the testing I did in March that led me to be much more skeptical of the effectiveness of AI detection tools.
But the testing process left me with a sinking feeling for two reasons:
Without access to a lot more data, I cannot establish the validity of detectors of AI-generated text.
My tests give some promising evidence that Winston AI reliably provides true positives and true negatives, but it is not evidence sufficient to establish Winston AI's claim to be “able to detect AI generated copy with a 99% accuracy.” I would need to run many more tests, with a much wider range of texts, to establish anything.
Now, I could just trust the detectors' developers. Maybe we should trust the team behind Winston AI when they claim that their detector is 99% accurate. We are reasonable when we trust all sorts of authorities about matters that we cannot evaluate ourselves.
But this takes me to the second reason for my sinking feeling…
Even if these detectors are very reliable, the core problem remains: they leave a problematic gap between the facts and the evidence.
I, as a professor, am obligated to make substantive claims in my teaching only if they are supported by evidence that can be verified in some way or another. That is, I should only make assertions in my capacity as a professor that are supported by evidence that my students could confirm themselves, were they appropriately trained, located, or otherwise positioned. Perhaps some claims can be taken on authority alone, but it is rare that I teach anything that is not open to some sort of transparent verification, at least in principle.
There is a lack of in-principle transparent verification in the case of the detectors of AI-generated text. They are, in effect, black boxes. In goes text, out comes judgments — the story of how they work and why, in a given case, is not available to their users. The evidence is far from ideal.
More importantly, I think we are all obligated to actually provide transparent verifiable evidence when we accuse someone of wrongdoing. Yet, these tools do not give me any such evidence and so I cannot give my students any such evidence. It is like punishing someone for theft for reasons that neither you nor they can check — “the theft detector says you stole, so away you go!”
In the "old days," practically every case of plagiarism was settled by either observing students plagiarize, getting students to admit to their plagiarism, or finding students' source texts after suspecting them of plagiarizing. As I reported in my piece from March on the depth of the AI plagiarism problem, I would generally seek evidence of student plagiarism by locating sources that they plagiarized from (or sources that had a common source with what they plagiarized). Accusing a student of plagiarism did not used to be a situation where the professor could lack transparent verifiable evidence.
The right sort of evidence cannot be obtained in the case of AI plagiarism, even if Winston AI and the other detectors work with 99% reliability. There is still a troubling gap between what is true and what can be proven.
Not only can a student accused of AI plagiarism simply deny the allegation, but the professor is left worried that they have a case falling in the 1%. This gap creates room for the professor to distrust their students, which erodes the learning environment.
I am not yet sure of where I will ultimately stand on detectors of AI-generated text, but it is certain that they introduce complications that incentivize professors to avoid assignments and activities that leave open the possibility of AI plagiarism altogether (or to design assignments that encourage the use of AI tools).
I plan to continue to test AI detectors in the coming months, but for now, I cannot shake my concerns about their relevance.