Google DeepMind Launches Gemini Robotics On-Device - News
On June 24, Google DeepMind unveiled the revolutionary robotics model Gemini Robotics On-Device, marking the first time a model can operate entirely locally without relying on cloud connectivity. This visual-language-action (VLA) model, based on the Gemini 2.0 architecture, integrates visual recognition, natural language understanding, and action generation capabilities, enabling robots to accurately perform complex tasks such as folding clothes, unzipping zippers, and even tying shoelaces in offline environments.

Performance and Features
Gemini Robotics On-Device achieves performance comparable to cloud-based models on local hardware through lightweight design and algorithm optimization, while significantly outperforming other local solutions. Equipped with high-density tactile sensors and an IP67 waterproof design, it can operate in water up to 1 meter deep for 30 minutes, adapting to complex environments like kitchens and bathrooms. In terms of mobility, walking speed has improved by 60% compared to the previous generation, with balance precision rivaling that of professional gymnasts, and support for dual-arm collaboration to perform precise operations such as industrial assembly.

Adaptability and Development
As Google's first open-tuning VLA model, Gemini Robotics On-Device can quickly adapt to new scenarios with just 50-100 task demonstrations. The accompanying Gemini Robotics SDK integrates the MuJoCo physics simulator, allowing developers to test the model in a virtual environment and gain access through the "Trusted Tester Program." This initiative is viewed by the industry as the "Android of robotics," with the potential to drive hardware-software decoupling and lower industry development barriers.
Applications in Various Sectors
In the industrial sector, the model has been adapted for the Franka FR3 dual-arm robot, enabling tasks such as conveyor belt assembly and quality inspection. In home settings, it can perform daily services like cooking and walking the dog through natural language interaction, and even prepare breakfast in advance based on users' schedules. Google has implemented semantic safety reviews through the Gemini Live API and set action force and speed limits at the hardware level to build a multi-layered safety protection system.
Although currently developed based on Gemini 2.0, its core team has already begun integrating the performance enhancements of the latest Gemini 2.5, potentially paving the way for further breakthroughs in multi-step logical planning capabilities. As localized AI models become more widespread, the robotics industry is shifting from "cloud dependency" to "edge intelligence." The launch of Gemini Robotics On-Device marks a new era of embodied intelligence entering large-scale application.