ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story?
The use of artificial intelligence (AI) in various industries has established its role in many subfields, including natural language processing and computer vision, proposing innovative ways to approach different tasks and problems. The Chat Generative Pre-trained Transformer (ChatGPT) is a novel AI tool that has recently gained considerable attention for its predictive power and potential uses.
Several recent studies have demonstrated that ChatGPT can accurately answer questions from undergraduate exams such as the United States Medical Licensing Examination and MBA courses. However, its ability to succeed in a more challenging, post-graduate exam like the European Exam in Core Cardiology (EECC) was unknown.
The European Exam in Core Cardiology
The EECC is a knowledge-based assessment designed to provide a broad, balanced, and up-to-date test of core cardiology knowledge necessary for independent practice. It is the final theoretical exam for the completion of cardiology specialty training in numerous countries and consists of 120 multiple-choice questions (MCQs) covering the entire spectrum of cardiology.
Assessing ChatGPT's predictive power
In this study, we evaluated the performance of ChatGPT on the EECC to assess its predictive power for high-level, post-graduate exams. We obtained 362 MCQ items from different sources, including the official ESC website, StudyPRN, and Braunwald's Heart Disease Review and Assessment, and screened them to exclude those with visual or audio components.
We submitted each question from the question bank to ChatGPT and rigorously compared the responses generated by ChatGPT with the correct answers from the source material to ascertain the model's overall accuracy in answering the EECC questions.
Results
Our results demonstrate that ChatGPT succeeded in answering 340 out of 362 questions, with an overall accuracy of 58.8% across all question sources. ChatGPT's accuracy with the ESC sample, BHDRA, and StudyPRN was 61.7%, 52.6%, and 63.8%, respectively. Moreover, it correctly answered 42/68 of the ESC sample questions, 79/150 of the BHDRA questions, and 92/144 of the StudyPRN questions.
Although the EECC is considered a challenging exam, ChatGPT's ability to score above or near the pass mark represents a significant milestone. European Exam in Core Cardiology questions require deductive reasoning and a rational approach that necessitates a substantial amount of knowledge to answer successfully.
Conclusion
In conclusion, our results demonstrate ChatGPT's high predictive power and potential uses in post-graduate exams. However, the exclusion of questions with visual or audio content represents a significant limitation for its widespread use. Nevertheless, ChatGPT remains an essential tool in natural language processing tasks and a significant breakthrough for artificial intelligence.