Student vs. Chatbot: Accounting Exam Results Are In

In a Battle Between ChatGPT and Accounting Students, the Students...

OpenAI's newest chatbot, GPT-4, has been creating waves in the education sector. It uses machine learning to generate natural language text and has been able to pass several prominent exams with flying colors, including the bar exam, 13 of the 15 AP exams, and nearly got a perfect score on the GRE Verbal test, according to OpenAI. This has sparked a debate about how AI chatbots should factor into education.

Recently, David Wood, a professor of accounting at Brigham Young University, and his team decided to put ChatGPT to the test by recruiting as many professors as possible to see how the chatbot would fare against actual accounting students on accounting exams. The study received 327 co-authors from 186 educational institutions in 14 countries, and they contributed 25,181 classroom accounting exam questions. They also recruited undergrad Brigham Young University (BYU) students to feed another 2,268 textbook test bank questions to ChatGPT. The questions covered accounting information systems (AIS), auditing, financial accounting, managerial accounting, and tax.

The study found that while ChatGPT scored an impressive 47.4%, the students performed much better with an overall average score of 76.7%. On 11.3% of the questions, ChatGPT scored higher than the student average, doing particularly well on AIS and auditing. However, the chatbot did worse on tax, financial, and managerial assessments, possibly because ChatGPT struggled with the mathematical processes required for the latter type, according to the researchers. In general, higher-order questions were harder for ChatGPT to answer, and sometimes, it would provide authoritative written descriptions for incorrect answers or answer the same question differently, according to the researchers.

The study also revealed other interesting trends, such as ChatGPT doing better on true or false questions (68.7% correct) and multiple-choice questions (59.5%) but struggling with short-answer questions (between 28.7% and 39.1%). The researchers believe that the newer chatbot, GPT-4, could improve exponentially on the accounting questions posed in their study.

Despite ChatGPT's impressive score, it is clear that students performed much better. It's not perfect, and trying to learn solely by using ChatGPT is a fool's errand, as Jessica Wood, a freshman at BYU, puts it. However, the researchers expect the chatbot to help improve teaching and learning, including the ability to design and test assignments. It could also be used for drafting portions of a project.

Overall, the research has opened up avenues for the education sector to reflect on whether they are teaching value-added information or not. It is a disruption, and the sector needs to assess where they go from here. The researchers believe that while ChatGPT is not perfect, it is an opportunity to look at teaching assistants in a new light.