A Comparison of Responses from Human vs. Chatbots in Mental Health Scenarios - JMIR Mental Health
Published on in Vol 12 (2025)
1Institute for Human-Centered AI, Stanford University, Stanford, CA, United States2PGSP-Stanford PsyD Consortium, Palo Alto University, Palo Alto, CA, United States3Dissemination and Training Division, National Center for PTSD, Menlo Park, CA, United States4Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, United States5Department of Medicine Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States
Background:
Consumers are increasingly using large language model–based chatbots to seek mental health advice or intervention due to ease of access and limited availability of mental health professionals. However, their suitability and safety for mental health applications remain underexplored, particularly in comparison to professional therapeutic practices.
Objective:
This study aimed to evaluate how general-purpose chatbots respond to mental health scenarios and compare their responses to those provided by licensed therapists. Specifically, we sought to identify chatbots’ strengths and limitations, as well as the ethical and practical considerations necessary for their use in mental health care.
Methods:
We conducted a mixed methods study to compare responses from chatbots and licensed therapists to scripted mental health scenarios. We created 2 fictional scenarios and prompted 3 chatbots to create 6 interaction logs. We recruited 17 therapists and conducted study sessions that consisted of 3 activities. First, therapists responded to the 2 scenarios using a Qualtrics form. Second, therapists went through the 6 interaction logs using a think-aloud procedure to highlight their thoughts about the chatbots’ responses. Finally, we conducted a semistructured interview to explore subjective opinions on the use of chatbots for supporting mental health. The study sessions were analyzed using thematic analysis. The interaction logs from chatbot and therapist responses were coded using the Multitheoretical List of Therapeutic Interventions codes and then compared to each other.
Results:
We identified 7 themes describing the strengths and limitations of the chatbots as compared to therapists. These include elements of good therapy in chatbot responses, conversational style of chatbots, insufficient inquiry and feedback seeking by chatbots, chatbot interventions, client engagement, chatbots’ responses to crisis situations, and considerations for chatbot-based therapy. In the use of Multitheoretical List of Therapeutic Interventions codes, we found that therapists evoked more elaboration (Mann-Whitney U=9; P=.001) and used more self-disclosure (U=45.5; P=.37) as compared to the chatbots. The chatbots used affirming (U=28; P=.045) and reassuring (U=23; P=.02) language more often than the therapists. The chatbots also used psychoeducation (U=22.5; P=.02) and suggestions (U=12.5; P=.003) more often than the therapists.
Conclusions:
Our study demonstrates the unsuitability of general-purpose chatbots to safely engage in mental health conversations, particularly in crisis situations. While chatbots display elements of good therapy, such as validation and reassurance, overuse of directive advice without sufficient inquiry and use of generic interventions make them unsuitable as therapeutic agents. Careful research and evaluation will be necessary to determine the impact of chatbot interactions and to identify the most appropriate use cases related to mental health.
Society is facing a massive mental health challenge. As the prevalence of various mental illnesses and loneliness is rising, there is a shortage of available mental health professionals [
]. Access to proper treatment can thus be limited and expensive, especially in the United States. Recent advancements in the development of large language models (LLMs) that power artificially intelligent chatbots could offer an enticing option for those seeking help. Chatbots are always available, at a low cost or free, and allow users to speak their mind freely without fear of judgment. In a recent study that targeted a nationally representative sample of >1800 individuals, 24% indicated that they had used LLMs for mental health needs [ ].The Role of AI Chatbots in Mental Health:
Today’s artificial intelligence (AI) chatbots can generally be grouped into 3 categories: AI assistants, AI companions, and AI character platforms. AI assistants are systems that help users with common everyday tasks in both their professional and private lives. AI companions are systems that let users interact with one central chatbot, which is customized over time, and are meant for leisurely or personal conversations. AI character platforms are similar to AI companions with a focus on private rather than professional conversations. They are different from AI companions because they offer users the chance to generate various chatbots and publish them on a platform where anyone can use them. The most studied use of chatbots for supporting mental health is social companionship with apps such as Replika (AI companion) and Character AI (AI character platform). Emerging evidence suggests that LLM-based chatbots as social companions can offer positive support and contribute to general psychological wellness [
, ]. There have been numerous studies, especially focused on the AI companion app Replika, that found positive mental health outcomes of using chatbots, such as increased confidence and improved relationships with friends [ , ].In using social companion agents, studies have reported that a clear boundary in regard to therapy and companionship is hard to achieve. For instance, participants engage with chatbots for objectives beyond companionship, including using a chatbot as a therapist and an intellectual mirror [
].Potential for Therapeutic Use:
While there is much research on the use of chatbots as social companions, work on understanding their potential for use as therapeutic agents is currently emerging. Some research suggests that they have the potential to assist therapists and exhibit traits of both high- and low-quality therapy [
]. Eshghie and Eshghie [ ] noted that with carefully crafted prompts, ChatGPT is able to positively participate in conversations and offer validation and potential coping strategies.The application of LLMs in the sensitive context of mental health comes with several risks, such as concerns around ethical provision of services and perpetuation of disparities and stigma [
]. Research has raised several concerns that must be addressed to ensure ethical and safe use of AI for mental health care [ ]. In addition, the American Psychological Association recently issued a letter to the Federal Trade Commission expressing significant concerns about the potential harm caused by chatbots that present themselves to consumers as “therapists” [ ].There is some evidence emerging that chatbots designed for mental health support could improve mental health care for certain mental health conditions, such as depression and anxiety [
, - ]. If chatbots were to be effective at improving mental health outcomes at scale, they could improve the mental well-being of millions at a low cost. These chatbots are publicly available and are already being used by many people as a form of mental health support [ ]. However, most general purpose and social companion chatbots that are based on LLMs have not been thoroughly evaluated for these use cases. Given the potential benefits and risks with this novel tool, there is a need to evaluate these chatbots to understand their abilities and limitations in handling mental health conversations.