Unmasking ChatGPT Cheating: A Statistical Analysis Approach

Published On Thu Aug 15 2024
Unmasking ChatGPT Cheating: A Statistical Analysis Approach

New FSU research shows statistical analysis can de | Newswise

Newswise — As use of generative artificial intelligence continues to extend into all reaches of education, much of the concern related to its impact on cheating has focused on essays, essay exam questions and other narrative assignments. Use of AI tools such as ChatGPT to cheat on multiple-choice exams has largely gone ignored.

A Florida State University chemist is half of a research partnership whose latest work is changing what we know about this type of cheating, and their findings have revealed how the use of ChatGPT to cheat on general chemistry multiple-choice exams can be detected through specific statistical methods. The work was published in Journal of Chemical Education.

Identifying ChatGPT Usage

“While many educators and researchers try to detect AI assisted cheating in essays and open-ended responses, such as Turnitin AI detection, as far as we know, this is the first time anyone has proposed detecting its use on multiple-choice exams,” said Ken Hanson, an associate professor in the FSU Department of Chemistry and Biochemistry.

Chemistry & Biochemistry

Researchers collected previous FSU student responses from five semesters worth of exams, input nearly 1,000 questions into ChatGPT and compared the outcomes. Average score and raw statistics were not enough to identify ChatGPT-like behavior because there are certain questions that ChatGPT always answered correctly or always answered incorrectly resulting in an overall score that was indistinguishable from students.

By using fit statistics, researchers fixed the ability parameters and refit the outcomes, finding ChatGPT’s response pattern was clearly different from that of the students.

Behavior Differences

On exams, high-performing students frequently answer difficult and easy questions correctly, while average students tend to answer some difficult questions and most easy questions correctly. Low-performing students typically only answer easy questions correctly. But on repeated attempts by ChatGPT to complete an exam, the AI tool sometimes answered every easier question incorrectly and every hard question correctly. Hanson and Sorenson used these behavior differences to detect the use of ChatGPT with almost 100-percent accuracy.

The duo’s strategy of employing a technique known as Rasch modeling and fit statistics can be readily applied to any and all generative AI chat bots, which will exhibit their own unique patterns to help educators identify the use of these chat bots in completing multiple-choice exams.

The research is the latest publication in a seven-year collaboration between Hanson and machine learning engineer Ben Sorenson. Hanson, who earned his doctorate in chemistry from the University of Southern California in 2010, has published more than 100 papers and holds over a dozen patents.

To learn more about Hanson’s research and the FSU Department of Chemistry and Biochemistry, visit chem.fsu.edu.

Credit: Journal of Chemical Education