Unveiling DeepSeek R1: The Game-Changer in AI Technology

China's DeepSeek model is a major advance in AI technology

Last week, DeepSeek, a startup company based in Hongzhou, China, released its newest artificial intelligence model, DeepSeek R1. Within days, the chatbot became the most-downloaded app in Apple’s App Store.

Performance of DeepSeek R1

DeepSeek’s performance meets or exceeds that of state-of-the-art AI models from American companies such as Meta and Open AI, surpassing all open-source models previously available and many closed models on most standard benchmarks. The achievement sent shockwaves through Wall Street, wiping out approximately $1 trillion in market value for corporations in one day.

It also represents a major blow to US plans for sustaining AI dominance as part of its objective to prevent China from usurping the US as the top economic and military power in the world.

DeepSeek's R1 sparks global AI upheaval with low-cost brilliance ...

Comparison with Industry-Leading Models

The DeepSeek team tested its R1 model on 21 benchmarks and compared the results to those achieved by industry-leading AI models from Meta, Open AI and others. The benchmarks included English-language, Chinese-language, software-programming, and mathematics tasks.

DeepSeek R1 outperformed the other models on 12 of the 21 benchmarks. For the remaining nine benchmarks, it placed second on eight and fourth on one.

Efficiency and Model Construction

What makes the DeepSeek achievement particularly dramatic is the massive reduction in the computational resources needed to build R1. Building R1 required approximately 2.8 million compute hours on a graphics card from NVIDIA called an H800.

Chart: DeepSeek-R1 Upsets AI Market With Low Prices | Statista

DeepSeek R1 is open source, meaning that the full set of 671 billion parameters and the software used to operate the model are freely available to download, inspect, and modify.

Cost of Operation

DeepSeek also charges far less for the usage of R1 than its competitors. Running R1 via such application programming interface or API calls over the internet is far cheaper than for other leading AI models.

To achieve the lower costs to operate R1, DeepSeek uses an architecture called “Mixture of Experts.” This means that for each token generated, only a fraction of the model is activated, reducing the computing power required for model output.

Implications and Industry Impact

The DeepSeek achievement immediately eclipsed the planned $500 billion StarGate initiative and Open AI’s plans for o3-mini, turning the AI industry in general on its head.

Uncovering DeepSeek-R1 | Medium

The perception that the US has a long lead in AI has vanished practically overnight, raising questions about the ability of the US to create or maintain dominance in AI. DeepSeek and its R1 model have become the central topic of conversation, shifting the work focus of vast swathes of the AI industry.