Is OpenAI's GPT-4.1 Less Reliable Than Its Predecessors?

Published On Thu Apr 24 2025

OpenAI's GPT-4.1 may be less aligned than the company's previous models

In mid-April, OpenAI launched a powerful new AI model, GPT-4.1, which it claimed “excelled” at following instructions. But the results of several independent tests suggest the model is less aligned — that is to say, less reliable — than previous OpenAI releases.

Owain Evans on X: 'Emergent misalignment update: OpenAI's new GPT4 ...

Skipping the Safety Report Step

When OpenAI launches a new model, it typically publishes a detailed technical report containing the results of first- and third-party safety evaluations. The company skipped that step for GPT-4.1, claiming that the model wasn’t “frontier” and thus did not warrant a separate report.

Evaluating GPT-4.1 Behavior

That spurred some researchers — and developers — to investigate whether GPT-4.1 behaves less desirably than GPT-4o, its predecessor. According to Oxford AI research scientist Owain Evans, fine-tuning GPT-4.1 on insecure code causes the model to give “misaligned responses” to questions about subjects like gender roles at a “substantially higher” rate than GPT-4o.

Independent Tests by SplxAI

A separate test of GPT-4.1 by SplxAI, an AI red teaming startup, revealed similar tendencies. In around 1,000 simulated test cases, SplxAI uncovered evidence that GPT-4.1 veers off topic and allows “intentional” misuse more often than GPT-4o. SplxAI posits that GPT-4.1’s preference for explicit instructions is to blame.

GPT-4.1 Model Faces Scrutiny: Alignment and Stability Concerns Raised

Concerns and Mitigation Efforts

“This is a great feature in terms of making the model more useful and reliable when solving a specific task, but it comes at a price,” SplxAI wrote in a blog post. The company has published prompting guides aimed at mitigating possible misalignment in GPT-4.1. But the independent tests’ findings serve as a reminder that newer models aren’t necessarily better across the board.