Gemma 4 is live on Modular Cloud: day zero, fastest performance on NVIDIA and AMD

Modular · April 2, 2026, 4:52pm

Gemma 4 is live on Modular Cloud with day zero support and the fastest performance on both NVIDIA and AMD. MAX delivers 15% higher throughput vs. vLLM on B200, and we’re the only inference provider shipping Gemma 4 on a framework we built ourselves.

Two multimodal models live now:

Gemma 4 31B: dense, 256K context, built for deep reasoning across large inputs
Gemma 4 26B A4B: MoE, 26B total params, only 4B active per forward pass

Both handle text, images, and video natively.

Modular Cloud runs on MAX, our inference framework that unifies GPU kernels, graph compilation, and high-performance serving in a single hardware-agnostic stack. When a new architecture drops, we’re not waiting on upstream support or porting hand-tuned kernels. We went from new weights to SOTA performance on two hardware platforms in days.

NVIDIA B200 or AMD MI355X. Same stack, same API. Pick the price-performance point that fits your workload.

→ Try Gemma 4 for free in the playground.
→ Read the full breakdown
→ Deploy Gemma 4 on a dedicated Modular Cloud endpoint.

Which model are you planning to try first? Let us know in the thread!

Topic		Replies	Views
Gemma 4 is live on Modular Cloud with day zero support and the fastest performance on both NVIDIA and AMD Official Announcements	2	44	April 8, 2026
Modular: Modular 25.4: One Container, AMD and NVIDIA GPUs, No Lock-In Content blog	0	60	June 18, 2025
Modular 25.4: One Container, AMD and NVIDIA GPUs, No Lock-In Official Announcements	0	82	June 18, 2025
Modular: Modular + AMD: Unleashing AI performance on AMD GPUs Content blog	0	66	June 10, 2025
Seeking Guidance: Low-Latency MedGemma Inference with Mojo 🔥 Mojo	5	237	December 7, 2025

Gemma 4 is live on Modular Cloud: day zero, fastest performance on NVIDIA and AMD

Related topics