BYU Study Shows ChatGPT's Shortcomings in Accounting Exams

Published On Sat May 13 2023
BYU Study Shows ChatGPT's Shortcomings in Accounting Exams

ChatGPT Performs Worse Than Students at Accounting Exams: Study

OpenAI's chatbot product, ChatGPT, performed worse than students at accounting exams, according to a recent study published in the journal Issues in Accounting Education. Researchers from Brigham Young University and 186 other universities found that students scored an overall average of 76.7 percent compared to ChatGPT's score of 47.4 percent. Although researchers praised ChatGPT's performance as "impressive," they also noted that the AI bot struggled with mathematical processes required for tax, financial, and managerial assessments.

ChatGPT relies on machine learning to generate natural language text. In the study, the AI bot performed better on true/false questions (68.7 percent correct) and multiple-choice questions (59.5 percent correct), but did poorly on short-answer questions (between 28.7 and 39.1 percent). The researchers discovered that ChatGPT often provided explanations for its answers, even if they were incorrect. At times, it even selected the wrong multiple-choice answer, despite providing accurate descriptions.

The researchers found that ChatGPT's struggle with higher-order questions, as well as its tendency to sometimes make up facts and provide wrong answers, could lead to significant consequences. They also noted that ChatGPT sometimes provided authoritative written descriptions for incorrect answers or answered the same question in different ways.

Lead study author David Wood, a BYU professor of accounting, recruited as many professors as possible to see how ChatGPT would compare against accounting university students. 327 co-authors from 186 educational institutions in 14 countries participated in the research, contributing 25,181 classroom accounting exam questions. The researchers also used undergraduate BYU students to feed another 2,268 textbook test bank questions to ChatGPT, which covered AIS, auditing, financial accounting, managerial accounting, and tax, and varied in difficulty and type (true/false, multiple choice, short answer).

Overall, the study suggests that while ChatGPT's performance is impressive, it still has significant shortcomings compared to human students in accounting exams.