Is DeepSeek AI Secretly Training on Google Gemini Outputs?

Published On Thu Jun 05 2025

DeepSeek AI Accused of Training on Google Gemini Outputs Amid ...

Chinese AI lab DeepSeek is under renewed scrutiny following the release of its updated R1 model, with researchers suggesting it may have been trained on outputs from Google’s Gemini models.

Allegations Against DeepSeek

Developer Sam Paech pointed to linguistic similarities between DeepSeek’s R1-0528 and Gemini 2.5 Pro, claiming in a post on X that the model’s phrasing patterns suggest a switch from OpenAI-based to Gemini-generated synthetic data. Another developer behind the SpeechMap evaluation tool said DeepSeek’s internal “traces” resemble those of Gemini.

Concerns and Countermeasures

While model similarities don’t prove misuse – many AIs echo common phrasing due to web content saturation – experts say the risk of “AI slop” in training data is growing. As a countermeasure, OpenAI and others have begun limiting API access and summarizing model traces to hinder unauthorized distillation.

Image: Gemini 2.0: Flash, Flash-Lite and Pro - Google Developers Blog

“DeepSeek is short on GPUs and flush with cash,” said AI2 researcher Nathan Lambert. “Using synthetic data from top-tier models would be a logical shortcut.”

From DeepSeek to Distillation: Protecting IP In An AI World

Subscribe to iAfrica.com and get the latest news, breakthroughs, and insights on AI, innovation, business, and technology — straight from the forefront of Africa’s digital evolution.