Unleashing AI Power with Google Cloud Run GPU and NVIDIA L4

Published On Mon Jun 09 2025

Introduction to Google Cloud Run GPU

Google Cloud has recently announced the launch of its Cloud Run GPU service. This service allows users to leverage the power of NVIDIA L4 GPUs within a cloud-native environment. It is specifically designed for AI computation, inference, and training workloads.

Key Features of Cloud Run GPU

The Cloud Run GPU service offers seamless auto-scaling and flexible deployment capabilities. One notable feature is that users do not need to predefine GPU configurations. The system dynamically allocates GPU resources based on computational demands, preventing idle resource wastage and unnecessary costs. This flexibility enhances deployment agility and simplifies operational management through automation.

Supercharging AI Video and AI Inference Performance with NVIDIA L4 GPU

Billing Model and Performance

Users of the Cloud Run GPU service benefit from a per-second billing model. Charges cease when the GPUs are not in use. Additionally, the GPUs and drivers can initialize from a cold start in approximately five seconds. For example, when performing inference with the Gemma 3 model, which comprises 4 billion parameters, the time from cold start to generating the first token is roughly 19 seconds, showcasing the platform's rapid startup efficiency.

Integration and Reliability

Cloud Run GPU seamlessly integrates with applications, allowing GPU acceleration to be enabled through embedded commands or toggles in the application service console. Google Cloud highlights the service's operational reliability due to its elastic architecture. Users and enterprises can deploy across multiple regions based on business needs and can disable zonal redundancy to optimize compute resource allocation.

Host your LLMs on Cloud Run | Google Cloud Blog

Availability

The Cloud Run GPU service is currently live in various Google Cloud regions across the United States, Europe, and Asia, providing users with access to powerful AI capabilities.