Introduction
Since the rise of large language models like ChatGPT, there have been increasing reports of students utilizing AI-generated content in their exam submissions and achieving favorable results. To investigate this phenomenon, a controlled experiment was conducted by Peter Scarfe, a researcher at the School of Psychology and Clinical Language Sciences at the University of Reading.
Experiment Details
Scarfe's team created over 30 fake psychology student accounts to submit ChatGPT-4-produced answers to examination questions. The results confirmed the anecdotal reports, revealing that the AI use went mostly undetected, with ChatGPT outperforming human students on average.
The AI-generated submissions were made in five undergraduate modules essential for a psychology bachelor's degree, covering various classes across all three years of study. The submissions ranged from 200-word answers to short questions to more elaborate 1,500-word essays.
Despite no editing or attempts to conceal the AI usage, 94 percent of the AI-generated submissions remained undetected, and nearly 84 percent received better grades compared to a randomly selected group of students who took the same exams.
Challenges and Insights
Scarfe's team encountered surprising outcomes during debriefing meetings with exam markers. Detection of AI submissions was not due to obvious robotic or repetitive content but rather because the submissions were exceptionally good.
This raises important questions about the detection of AI-generated content. Various tools, such as Open AI’s GPTZero and Turnitin's AI writing detection system, have been explored for this purpose. However, their real-world performance often falls short of expectations.
Detection Tools Performance
Open AI's GPTZero can identify AI-generated text as “likely” AI 26 percent of the time. On the other hand, Turnitin’s system was reported to detect 97 percent of ChatGPT and GPT-3 authored writing in a lab setting. Nevertheless, Scarfe's team found that the performance of Turnitin's system in the beta version was notably inferior.
While advancements in AI technology have opened up new possibilities, they also present challenges in maintaining academic integrity. The implications of AI-generated content in educational settings require careful consideration and the development of reliable detection mechanisms.