Students Perform better than ChatGPT in Accounting Exams
A recent study conducted by Brigham Young University (BYU) and 186 other universities found that students performed better on accounting exams than ChatGPT, OpenAI's chatbot product. Despite this, researchers were impressed with ChatGPT's performance and stated that it has the potential to revolutionize the way we learn and teach for the better.
The study, which was published in the journal Issues in Accounting Education, aimed to determine how well OpenAI's technology would fare in accounting exams. According to the findings, students scored an overall average of 76.7%, while ChatGPT scored 47.4%. Although ChatGPT scored higher than students in 11.3% of the questions, it performed worse on tax, financial, and managerial assessments. Researchers suspect that this might be because ChatGPT struggled with the mathematical processes required for these types of questions.
ChatGPT uses machine learning to generate natural language text, and the study found that it performed better on true/false questions (68.7% correct) and multiple-choice questions (59.5%). However, it struggled with short-answer questions (between 28.7% and 39.1%). Researchers noted that higher-order questions were more challenging for ChatGPT to answer. They also found that ChatGPT made nonsensical mathematical errors, such as incorrect additions or divisions.
The study's lead author, BYU professor of accounting David Wood, recruited as many professors as possible to determine how well ChatGPT fared against actual university accounting students. 327 co-authors from 186 educational institutions in 14 countries participated in the research, contributing 25,181 classroom accounting exam questions. Additionally, undergraduate BYU students fed 2,268 textbook test bank questions to ChatGPT. The questions covered AIS, auditing, financial accounting, managerial accounting, and tax, and varied in difficulty and type, such as true/false, multiple choice, and short answer.
It's worth noting that researchers found that ChatGPT sometimes generated false facts. For instance, it provided a reference that appeared accurate but was entirely fabricated since the individuals and work referenced did not exist. It also provided explanations for incorrect answers and sometimes answered the same question in different ways.
In conclusion, although ChatGPT has its limitations, it's clear that it has the potential to revolutionize the education sector, and it will be interesting to see how AI technology will continue to evolve in the future.