Did Apple's Recent “Illusion of Thinking” Study Expose Fatal Shortcomings in Using LLMs for Artificial General Intelligence?
Recently, researchers at Apple published a study titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity”. This study has stirred up a lot of controversy in the AI community. The title itself, “The Illusion of Thinking,” is quite bold and attention-grabbing.
The Rise of Large Reasoning Models (LRMs)
Traditional Large Language Model (LLM) AI programs, such as ChatGPT, have been trained on vast amounts of human-generated text to mimic human outputs based on prompts. A recent development has been the integration of formal reasoning capabilities into these models, giving rise to Large Reasoning Models (LRMs). Leading LLMs like Open AI’s GPT, Claude, and DeepSeek now exist in both LLM and LRM versions.
The Study Findings
In their study, the authors tested both the regular LLM and the "thinking" LRM versions of Claude 3.7 Sonnet and DeepSeek on various mathematical puzzles. They discovered that while LRMs performed well at lower complexities, they experienced a "complete collapse" as complexity increased. Surprisingly, at lower complexities, LLMs actually outperformed LRMs. Additionally, when provided with an efficient solution algorithm, the programs failed to utilize it effectively.

Implications and Reactions
The findings of the study have prompted skepticism within the AI community, with critics highlighting the limitations of LLMs in truly understanding and reasoning through problems beyond their training data. This skepticism has significant implications for AI progress and investment in the field, as companies like Meta and Google are heavily investing in developing artificial general intelligence.
Ars Technica provided a balanced account of the controversy, acknowledging the potential of pattern-matching machines in performing specific tasks efficiently.
Counterarguments and Debates
On the other hand, defenders of AI have presented counterarguments to the Apple study. Alex Lawsen from Open Philanthropy published a rebuttal titled “The Illusion of the Illusion of Thinking”, challenging the notion of complete collapse in reasoning models. Lawsen demonstrated that with slight operational modifications, LRMs could effectively tackle high-complexity tasks, contradicting the Apple study's conclusions.
The debate over the potential of artificial general intelligence continues, with varying perspectives on the capabilities and limitations of current AI models.










