DeepSeek AI Accused of Training on Google Gemini Outputs Amid ...
Chinese AI lab DeepSeek is under renewed scrutiny following the release of its updated R1 model, with researchers suggesting it may have been trained on outputs from Google’s Gemini models.

Allegations Against DeepSeek
Developer Sam Paech pointed to linguistic similarities between DeepSeek’s R1-0528 and Gemini 2.5 Pro, claiming in a post on X that the model’s phrasing patterns suggest a switch from OpenAI-based to Gemini-generated synthetic data. Another developer behind the SpeechMap evaluation tool said DeepSeek’s internal “traces” resemble those of Gemini.
Concerns and Countermeasures
While model similarities don’t prove misuse – many AIs echo common phrasing due to web content saturation – experts say the risk of “AI slop” in training data is growing. As a countermeasure, OpenAI and others have begun limiting API access and summarizing model traces to hinder unauthorized distillation.

“DeepSeek is short on GPUs and flush with cash,” said AI2 researcher Nathan Lambert. “Using synthetic data from top-tier models would be a logical shortcut.”
From DeepSeek to Distillation: Protecting IP In An AI World
Subscribe to iAfrica.com and get the latest news, breakthroughs, and insights on AI, innovation, business, and technology — straight from the forefront of Africa’s digital evolution.