AI in Education: The Rise of Undetected AI-Generated Content

Published On Sat Jun 29 2024

Introduction

Since the rise of large language models like ChatGPT, there have been increasing reports of students utilizing AI-generated content in their exam submissions and achieving favorable results. To investigate this phenomenon, a controlled experiment was conducted by Peter Scarfe, a researcher at the School of Psychology and Clinical Language Sciences at the University of Reading.

Experiment Details

Scarfe's team created over 30 fake psychology student accounts to submit ChatGPT-4-produced answers to examination questions. The results confirmed the anecdotal reports, revealing that the AI use went mostly undetected, with ChatGPT outperforming human students on average.

The AI-generated submissions were made in five undergraduate modules essential for a psychology bachelor's degree, covering various classes across all three years of study. The submissions ranged from 200-word answers to short questions to more elaborate 1,500-word essays.

AI generated exam answers go undetected in real-world blind test

Despite no editing or attempts to conceal the AI usage, 94 percent of the AI-generated submissions remained undetected, and nearly 84 percent received better grades compared to a randomly selected group of students who took the same exams.

Challenges and Insights

Scarfe's team encountered surprising outcomes during debriefing meetings with exam markers. Detection of AI submissions was not due to obvious robotic or repetitive content but rather because the submissions were exceptionally good.

GPT-4 passes basically every exam. And doesn't just pass... The

This raises important questions about the detection of AI-generated content. Various tools, such as Open AI’s GPTZero and Turnitin's AI writing detection system, have been explored for this purpose. However, their real-world performance often falls short of expectations.

Detection Tools Performance

Open AI's GPTZero can identify AI-generated text as “likely” AI 26 percent of the time. On the other hand, Turnitin’s system was reported to detect 97 percent of ChatGPT and GPT-3 authored writing in a lab setting. Nevertheless, Scarfe's team found that the performance of Turnitin's system in the beta version was notably inferior.

Evaluating AI-text detectors. Quantifying the degree to which ...

While advancements in AI technology have opened up new possibilities, they also present challenges in maintaining academic integrity. The implications of AI-generated content in educational settings require careful consideration and the development of reliable detection mechanisms.