ChatGPT, Copilot, Gemini: The Ultimate Face-off

Published On Tue May 28 2024
ChatGPT, Copilot, Gemini: The Ultimate Face-off

The results of evaluating the performance of 'ChatGPT', 'Copilot', 'Gemini', 'Perplexity', and 'Claude'

As AI accuracy improves, chat AIs that can handle everyday conversations smoothly, such as ChatGPT, Copilot, and Gemini, are appearing one after another. However, it is difficult for general users to judge which chat AI is the most powerful.

Meanwhile, The Wall Street Journal conducted a 'test to evaluate the response performance of everyday conversations' for five types of chat AI and published the test results.

The Art of Scepticism in a Data-Driven World | UNSW Sydney

The Great AI Chatbot Challenge: ChatGPT vs. Gemini vs. Copilot vs. Perplexity vs. Claude - WSJ

https://www.wsj.com/tech/personal-tech/ai-chatbots-chatgpt-gemini-copilot-perplexity-claude-f9e40d26

When AI companies and AI researchers promote the performance of their own AI, they often use scores measured using benchmark tools. However, just because an AI has a good benchmark test score does not necessarily mean that it can accurately answer questions asked in everyday conversations.

Can A.I. Treat Mental Illness? | The New Yorker

Therefore, the Wall Street Journal conducted a test to evaluate the responses of five chat AIs, 'ChatGPT,' 'Copilot,' 'Gemini,' 'Claude,' and 'Perplexity,' by inputting questions that are likely to arise in everyday conversations.

The questions used in the test were created in collaboration with Wall Street Journal editors and columnists, and included questions in a variety of categories such as 'health,' 'finance,' and 'cooking.'

For example, the cooking category included questions such as, 'Can you bake a chocolate cake without flour, gluten, dairy, nuts, or eggs? If so, please give me the recipe.'

Will AIs Take All Our Jobs and End Human History—or Not? Well ...

These questions were entered into five chat AIs, and editors and columnists evaluated the responses for 'accuracy,' 'usefulness,' and 'overall quality' without identifying which AI they were. Paid versions of the chat AIs were used for the test: ChatGPT used 'GPT-4o,' and Gemini used 'Gemini 1.5 Pro.'

The test results are as follows. Although performance varied depending on the question category, Perplexity came in first in the overall evaluation. However, Perplexity had the slowest response time among the five chat AIs. In addition, there was no significant difference between the five chat AIs in coding questions.

Related Posts

<< Next Japanese chip maker Rapidus plans to manufacture and package chips at a factory that will cost about 5 trillion yen, setting it apart from TSMC, Intel and Samsung.Prev >>Metacritic's 'Best Games of 2024' top three are 'FFVIIR,' 'Animal Well,' and 'Balatro,' with half of the top ten being indie games in Software