OpenAI's GPT-4.1 may be less aligned than the company's previous models
In mid-April, OpenAI launched a powerful new AI model, GPT-4.1, which it claimed “excelled” at following instructions. But the results of several independent tests suggest the model is less aligned — that is to say, less reliable — than previous OpenAI releases.
Skipping the Safety Report Step
When OpenAI launches a new model, it typically publishes a detailed technical report containing the results of first- and third-party safety evaluations. The company skipped that step for GPT-4.1, claiming that the model wasn’t “frontier” and thus did not warrant a separate report.
Evaluating GPT-4.1 Behavior
That spurred some researchers — and developers — to investigate whether GPT-4.1 behaves less desirably than GPT-4o, its predecessor. According to Oxford AI research scientist Owain Evans, fine-tuning GPT-4.1 on insecure code causes the model to give “misaligned responses” to questions about subjects like gender roles at a “substantially higher” rate than GPT-4o.
Independent Tests by SplxAI
A separate test of GPT-4.1 by SplxAI, an AI red teaming startup, revealed similar tendencies. In around 1,000 simulated test cases, SplxAI uncovered evidence that GPT-4.1 veers off topic and allows “intentional” misuse more often than GPT-4o. SplxAI posits that GPT-4.1’s preference for explicit instructions is to blame.
Concerns and Mitigation Efforts
“This is a great feature in terms of making the model more useful and reliable when solving a specific task, but it comes at a price,” SplxAI wrote in a blog post. The company has published prompting guides aimed at mitigating possible misalignment in GPT-4.1. But the independent tests’ findings serve as a reminder that newer models aren’t necessarily better across the board.
Conclusion
In a similar vein, OpenAI’s new reasoning models hallucinate — i.e. make stuff up — more than the company’s older models.
Topics
More from TechCrunch:
- Mystery will may reveal Zappos founder’s final wishes
- Windsurf slashes prices as competition with Cursor heats up
- Tesla begins ‘FSD Supervised’ ride-hail tests with employees in Austin, Bay Area
- Government censorship comes to Bluesky, but not its third-party apps … yet
- Anti-Musk protests are now an official risk to Tesla’s business
© 2025 Yahoo.