Qwen3 vs. OpenAI and Google: Alibaba's AI Model Comparison

Published On Tue Apr 29 2025

Alibaba's Qwen3 Outperforms OpenAI's o1 and o3-mini, on Par With Google's Gemini 2.5 Pro Models

Chinese giant Alibaba has released the Qwen3 family of open-weight AI models. The Qwen3 family includes variants with different sizes, ranging from 0.6B to 235B parameters. These models can be deployed locally using tools like Ollama and LM Studio or accessed through a web browser via Qwen Chat.

Performance Comparison

Qwen3's 235B parameter model has shown remarkable performance on various benchmarks, outperforming OpenAI's o1 and o3-mini models in mathematical and programming tasks. It also demonstrates performance parity with Google's Gemini 2.5 Pro models on several benchmarks. However, it falls slightly behind OpenAI's newly released o4-mini model in certain benchmarks.

Alibaba Introduces Qwen3, Setting New Benchmark in Open-Source AI

In coding benchmarks, Qwen3's 235B model scored 70.7%, while the o4-mini model scored 80%. Similarly, on the AIME 2024 math benchmark, OpenAI's o4-mini model scored 94%, just slightly higher than Qwen3's 235B model which scored 85.7%.

Benchmark scores for other newly released models can be found on Artificial Analysis. Additionally, other variants of the Qwen-3 models have shown improvements over their predecessors, with the 30B parameter variant outperforming DeepSeek-V3 and OpenAI's GPT-4o models on benchmarks.

Expert Opinion

Simon Willison, co-creator of the Django Web Framework, shared his thoughts on the Qwen3 release in a blog post. He praised the coordination of the release across the LLM ecosystem, highlighting how well the models work with popular LLM serving frameworks. Willison noted the efforts made by Alibaba to ensure compatibility with various systems and devices.

Willison also commented on the diverse sizes of the Qwen3 models, mentioning that smaller variants like the 0.6B and 1.7B models can run smoothly on devices like iPhones, while larger models like the 32B variant can fit on standard laptops with room to spare.

Recent Updates

The Qwen3 family of models succeeds the Qwen2.5 models. Recently, Alibaba introduced the QwQ 32 billion parameter model, which showed comparable performance to DeepSeek-R1 despite its smaller size.