Is OpenAI's GPT-4o Model Facing a Performance Regression?

Published On Sun Nov 24 2024

ChatGPT's latest model may be a regression in performance ...

According to a new report from Artificial Analysis, OpenAI’s flagship large language model for ChatGPT, GPT-4o, has significantly regressed in recent weeks, putting the state-of-the-art model’s performance on par with the far smaller, and notably less capable, GPT-4o-mini model.

Concerns and Analysis

This analysis comes less than 24 hours after the company announced an upgrade for the GPT-4o model. OpenAI claimed improvements in creative writing ability and working with uploaded files, promising deeper insights and more thorough responses.

GPT-4 is Better Than GPT-3.5 - Here Are Some Key Differences

However, doubts have been raised regarding these claims. According to Artificial Analysis, independent evaluations show a decrease in performance compared to the August release of GPT-4o. The Artificial Analysis Quality Index dropped from 77 to 71, aligning it with GPT-4o mini's performance.

Performance Metrics

GPT-4o’s performance on the GPQA Diamond benchmark decreased from 51% to 39%, while its MATH benchmarks decreased from 78% to 69%. Interestingly, the model's response speed has more than doubled, now producing around 180 tokens per second compared to 80 tokens per second previously.

Recommendations and Conclusion

GPT-4o Mini: Is it better than GPT-4o? Will it replace GPT-3.5 turbo?

Based on the data, researchers suggest that the November 20th GPT-4o model may be a smaller version than the August release. Despite this regression, OpenAI has not revised prices. Developers are advised not to transition workloads from the August version without thorough testing.

About GPT-4o

GPT-4o was introduced in May 2024 to outperform existing models like GPT-3.5 and GPT-4. It boasts cutting-edge results in voice, multilingual, and vision tasks, making it suitable for advanced applications such as real-time translation and conversational AI.