Data Privacy Concerns: DeepSeek AI's Clash with Google's Gemini Model

Published On Thu Jun 05 2025

DeepSeek AI Accused of Unauthorized Use of Google's Gemini Model

DeepSeek, a Chinese AI laboratory, has come under scrutiny for allegedly using unauthorized outputs from Google's GOOGL Financial Gemini model to train its upgraded inference model, R1-0528. Despite the impressive performance of the R1-0528 in various tests, concerns have been raised regarding the lack of transparency surrounding its training data sources.

Allegations of Misuse

AI developer Sam Paech drew attention to the similarities between the vocabulary and sentence structure of R1-0528 and Google's latest Gemini 2.5 Pro on X Platform. This resemblance has raised suspicions of potential misuse of Gemini's outputs. Additionally, an anonymous developer and the founder of SpeechMap pointed out that DeepSeek's model reasoning traces bear a striking resemblance to content generated by Gemini, casting doubts on the originality of the data.

History of Accusations

This is not the first time DeepSeek has faced accusations of unauthorized data usage. In December 2023, concerns were raised about its V3 model allegedly leveraging OpenAI's chat records, evident from frequent self-references to ChatGPT. OpenAI confirmed that DeepSeek may have employed data distillation, an AI development technique, to extract data from more advanced language models for training its AI, thus violating OpenAI's terms. By the end of 2024, Microsoft (MSFT), a partner of OpenAI, identified substantial data breaches associated with DeepSeek.

Enhancing Data Protection

In response to these incidents, efforts are underway to bolster data protection measures. Both OpenAI and Google are working on implementing safeguards to prevent unauthorized data usage. However, the vast amount of AI-generated online content poses challenges in effectively filtering training data, making it increasingly difficult to prevent misuse.