Decoding ChatGPT's Impact on Medical Question Answering

Published On Mon Sep 30 2024
Decoding ChatGPT's Impact on Medical Question Answering

Analyzing Question Characteristics Influencing ChatGPT's Performance in Answering USMLE® Step 2CK Questions

The potential of artificial intelligence (AI) and large language models like ChatGPT in medical applications is promising, yet its performance requires comprehensive evaluation. This study assessed ChatGPT’s capabilities in answering USMLE® Step 2CK questions, analyzing its performance across medical specialties, question types, and difficulty levels in a large-scale question test set to assist question writers in developing AI-resistant exam questions and provide medical students with a realistic understanding of how AI can enhance their active learning.

A total of n=3302 USMLE® Step 2CK practice questions were extracted from the AMBOSS© study platform, excluding 302 image-based questions, leaving 3000 text-based questions for analysis. Questions were manually entered into ChatGPT and its accuracy and performance across various categories and difficulties were evaluated.

Performance Analysis

ChatGPT answered 57.7% of all questions correctly. Highest performance scores were found in the category “Male Reproductive System” (71.7%) while the lowest were found in the category “Immune System” (46.3%). Lower performance was noted in table-based questions, and a negative correlation was found between question difficulty and performance (rs=−0.285, <0.001). Longer questions tended to be answered incorrectly more often (rs=−0.076, <0.001), with a significant difference in length of correctly versus incorrectly answered questions.

Implications for Medical Education

ChatGPT demonstrated proficiency close to the passing threshold for USMLE® Step 2CK. Performance varied by category, question type, and difficulty. These findings aid medical educators make their exams more AI-proof and inform the integration of AI tools like ChatGPT into teaching strategies. For students, understanding the model’s limitations and capabilities ensures it is used as an auxiliary resource to foster active learning rather than abusing it as a study replacement. This study highlights the need for further refinement and improvement in AI models for medical education and decision-making.

Artificial Intelligence in Medicine

Artificial intelligence (AI) combines machine learning mechanisms and deep learning strategies with automated processing of tasks that traditionally required human intelligence. Deep learning techniques employ artificial neural networks with multiple layers to address complex problems deployed for a plethora of medical scenarios including automating grading tasks, evaluating clinical patient data, and simulating surgical outcomes before surgery.

Revolutionizing healthcare: the role of artificial intelligence in medicine

Recently, ChatGPT has emerged as a revolutionary chatbot that uses a large language model and deep learning to generate analytic human-like responses to both medical and non-medical questions. These performance scores were further substantiated by larger-scale studies for Step 1, Step 3, and basic science as well as shelf examination questions, thereby effectively showcasing ChatGPT’s potential in medical test taking.

AI's Ascendance in Medicine: A Timeline | Cedars-Sinai

The USMLE® is a standardized test required for medical licensure in the USA, consisting of three steps with the first two exams commonly taken by second- and fourth-year medical students, respectively; Step 3 is typically taken by physicians with at least 6 months of postgraduate medical experience. With the recent transition of USMLE® Step 1 to a pass/fail system, the residency matching process is thought to now place greater emphasis on USMLE® Step 2 CK scores as the pivotal objective parameter.

Hence, a multi-faceted and comprehensive analysis of question characteristics influencing ChatGPT's performance in answering USMLE® Step 2CK questions provides valuable insights for educators, students, and the future integration of AI in medical education.