Chatbot Showdown: Analyzing the Effectiveness of ChatGPT and Gemini on Viral Hepatitis

Published On Mon Jan 13 2025

ChatGPT And Gemini Show Similar Performance On Viral Hepatitis

This study evaluates and compares the performance of the ChatGPT and Gemini chatbots when answering questions related to viral hepatitis. Though both chatbots demonstrated effectiveness, nuanced differences emerged, particularly when handling guideline inquiries.

Artificial intelligence models are becoming increasingly integrated within healthcare frameworks as society seeks reliable resources for accurate health information. The need for such evaluations grows as misinformation can exert significant public health consequences.

The Pros and Cons of Healthcare Chatbots

Evaluation of ChatGPT and Gemini

ChatGPT and Gemini were tasked with responding to 176 questions covering public-oriented material from the CDC, rigorous recommendations based on international guidelines, and frequently asked questions circulating on social media. The study, conducted over the month of March 2024, aimed to dissect the robustness and reliability of responses provided by these chatbots.

The results indicated similar performance levels between the two chatbots. The overall mean score for ChatGPT and Gemini was closely matched at approximately 3.55 and 3.57, respectively. Correct response rates were also comparable, with ChatGPT achieving accuracy for 71.0% of questions and Gemini for 78.4%. These findings highlight the potential utility of both chatbots, particularly for questions derived from recognized health organizations like the CDC.

Despite these promising scores, both models struggled with more nuanced guideline-based questions, yielding rates below the expectations for professional medical guidance. For example, only 49.4% of answers to guideline questions from the ChatGPT were deemed completely correct compared to 61.4% from Gemini.

Findings and Implications

Answer reproducibility, which was assessed to understand consistency between responses, demonstrated high retention rates at 91.3% for ChatGPT and 92% for Gemini. According to Cohen’s kappa test, substantial agreement was found among evaluators for both chatbots, placing them on equitable footing for providing accurate health-related information.

The Future of Health is Digital Disruption

Generative AI technologies present not only solutions but challenges. While they are viewed as capable of broadly informing the public about pressing health issues like viral hepatitis, their shortcomings signify the need for enhancements to reduce misleading information rates.

Ongoing advancements and updates to these AI tools could bolster their accuracy and reliability over time, marking them as integral to public health discourse moving forward. Future inquiries may explore refining their algorithms and reassessing their effectiveness as reliable communication tools within healthcare settings.

This exploration laid groundwork for additional research to verify AI’s place within the increasingly digital health education space, potentially shaping the future of how communities gather and process health information.